Methods and compositions for whole transcriptome amplification

ABSTRACT

The disclosure provides for methods, compositions, systems, devices, and kits for whole transcriptome amplification using stochastic barcodes.

REFERENCE TO SEQUENCE LISTING

The present application is being filed along with a Sequence Listing inelectronic format. The Sequence Listing is provided as a file entitledSequence_Listing.TXT, created Mar. 1, 2022, which is 8 Kb in size. Theinformation in the electronic format of the Sequence Listing isincorporated herein by reference in its entirety.

BACKGROUND

Methods and compositions for labeling nucleic acid molecules foramplification or sequencing have been developed. Sometimes a sampleprovides too little starting materials for performing counting ofnucleic acid molecules in the sample. Amplification can increase theamount of materials for downstream analysis methods, such as stochasticcounting. Herein described are methods, compositions, kits, and systemsfor whole transcriptome amplification (WTA), including counting ofnucleic acid molecules in samples using stochastic barcodes.

SUMMARY

Some embodiments disclosed herein provide methods for labeling aplurality of targets from a sample, comprising: hybridizing theplurality of targets from the sample with a plurality of nucleic acidseach comprising a first universal label; extending the plurality ofnucleic acids to generate a plurality of first strand polynucleotides;synthesizing a plurality of second strand polynucleotides using theplurality of first strand polynucleotides as templates to generate aplurality of double-stranded polynucleotides; ligating an adaptor to theplurality of double-stranded polynucleotides, wherein said adaptorcomprises a second universal label; and amplifying the plurality ofdouble-stranded polynucleotides using the first universal label and thesecond universal label, thereby generating a plurality of ampliconscomprising the plurality of targets. In some embodiments, each of theplurality of nucleic acids comprises a stochastic barcode. In someembodiments, said stochastic barcode comprises a molecular label, acellular label, a target-specific region, or any combination thereof. Insome embodiments, said target-specific region comprises an oligo dTsequence, a random sequence, a target-specific sequence, or anycombination thereof. In some embodiments, the plurality of targets areDNAs. In some embodiments, the plurality of targets are mRNAs. In someembodiments, synthesizing the plurality of second strand polynucleotidescomprises nicking the plurality of mRNAs with an RNase, therebygenerating one or more mRNA primers. In some embodiments, said RNase isRNaseH. In some embodiments, at least one of said one or more mRNAprimers is at least 15 nucleotides in length. In some embodiments, themethods further comprise extending said one or more mRNA primers with apolymerase, thereby generating extended segments. In some embodiments,said polymerase has 5′-3′ exonuclease activity. In some embodiments,said polymerase comprises DNA Pol I. In some embodiments, the methodsfurther comprise ligating said extended segments with a ligase, therebygenerating a second strand polynucleotide. In some embodiments, themethods further comprise extending said one or more mRNA primers with astrand displacing polymerase, thereby generating an extended secondstrand. In some embodiments, the methods further comprise removing saidone or more mRNA primers from the extended second strand. In someembodiments, the plurality of targets are nucleic acids from a singlecell. In some embodiments, the plurality of amplicons comprises a wholetranscriptome amplification (WTA) product. In some embodiments, saidadaptor is a double stranded polynucleotide. In some embodiments, saidadaptor comprises an AsiSI site. In some embodiments, said adaptor is apartially double stranded polynucleotide. In some embodiments, themethods further comprise blunt ending at least one of said plurality ofdouble-stranded polynucleotides. In some embodiments, the methodsfurther comprise adding an A overhang to said plurality ofdouble-stranded polynucleotides. In some embodiments, the methodsfurther comprise synthesizing a plurality of third strandpolynucleotides from the plurality of second strand polynucleotides. Insome embodiments, each of the plurality of nucleic acids is immobilizedon a solid support. In some embodiments, said solid support is a bead.In some embodiments, at least two of said plurality of nucleic acidsimmobilized on a single solid support comprises different molecularlabels. In some embodiments, said plurality of nucleic acids attached toa solid support comprises the same cellular label. In some embodiments,the sample comprises a single cell. In some embodiments, the samplecomprises a plurality of cells. In some embodiments, the first universallabel and the second universal label are the same. In some embodiments,the first universal label and the second universal label are different.In some embodiments, each one of the plurality of amplicons comprises atleast part of the first universal label, the second universal label, orboth.

Some embodiments disclosed herein provide methods for labeling aplurality of targets from a sample comprising: hybridizing the pluralityof targets from the sample with a plurality of nucleic acids eachcomprising a first universal label; extending the plurality of nucleicacids to generate a plurality of first strand polynucleotides, whereinthe plurality of first stand polynucleotides and the plurality oftargets form a plurality of double-stranded polynucleotides; fragmentingthe plurality of double-stranded polynucleotides using a firsttransposome to generate a plurality of double stranded polynucleotidesthat are ligated with a first adaptor, wherein said first adaptorcomprises a second universal label, and wherein the first transposomecomprises a first transposase and the first adaptor; and amplifying theplurality of double-stranded polynucleotides that are ligated with thefirst adaptor using the first universal label and the second universallabel, thereby generating a plurality of amplicons comprising theplurality of targets. In some embodiments, the methods further comprisefragmenting the plurality of double-stranded polynucleotides using asecond transposome to generate a plurality of double strandedpolynucleotides that are ligated with the first adaptor and a secondadaptor, wherein said second adaptor comprises a third universal label,and wherein the second transposome comprises a second transposase andthe second adaptor. In some embodiments, the first transposase and thesecond transposase are the same. In some embodiments, the firsttransposase and the second transposase are different. In someembodiments, the first adaptor and the second adaptor are the same. Insome embodiments, the first adaptor and the second adaptor aredifferent. In some embodiments, the methods further comprisesynthesizing a plurality of second strand polynucleotides using theplurality of first strand polynucleotides as templates. In someembodiments, said first adaptor comprises a stochastic barcode. In someembodiments, said second universal label is a transposome sequence. Insome embodiments, said first adaptor comprises a sequencing primerbinding site. In some embodiments, said sequencing primer is P7 or P5.In some embodiments, said second adaptor comprises a stochastic barcode.In some embodiments, said third universal label is a transposomesequence. In some embodiments, said second adaptor comprises asequencing primer binding site. In some embodiments, said sequencingprimer is P7 or P5. In some embodiments, each of the plurality ofnucleic acids is immobilized on a solid support. In some embodiments,said solid support is a bead. In some embodiments, the methods comprisepurifying double-stranded polynucleotides that are immobilized on beads,wherein the double-stranded polynucleotides are double-strandedpolynucleotides that are ligated with the first adaptor, the secondadaptor, or both.

Some embodiments disclosed herein provide kits comprising: a pluralityof solid supports, wherein each of the sloid support comprises aplurality of nucleic acids each comprising a first universal labelsequence; an nucleic acid adaptor comprising a second universal labelsequence; and an enzyme, wherein the enzyme is a ligase or atransposase. In some embodiments, said enzyme is a ligase. In someembodiments, said enzyme is a transposase. In some embodiments, saidplurality of nucleic acids comprises different molecular labels. In someembodiments, said plurality of nucleic acids comprises the same cellularlabel. In some embodiments, said plurality of solid supports is aplurality of beads. In some embodiments, the kits comprise one or moreadditional enzymes selected from the group consisting of a reversetranscriptase, a DNA polymerase, an RNase, an exonuclease, or anycombination thereof. In some embodiments, the kits further comprise asubstrate. In some embodiments, said substrate comprises microwells.

Some embodiments disclosed herein provide methods for labeling aplurality of target sequences from a single cell, comprising: providingthe single cell to a partition comprising a solid support immobilizedwith a plurality of nucleic acids each comprising a first universallabel; lysing the single cell to release the plurality of targetsequences; hybridizing the plurality of target sequences from the singlecell with the plurality of nucleic acids; extending the plurality ofnucleic acids to generate a plurality of first strand polynucleotides;adding an adaptor sequence to the plurality of first strandpolynucleotides, wherein said adaptor sequence comprises a seconduniversal label; and amplifying the plurality of first strandpolynucleotides using the first universal label and the second universallabel, thereby generating a plurality of amplicons comprising theplurality of target sequences. In some embodiments, said adaptorsequence is added by a transposome. In some embodiments, saidtransposome comprises a transposase and the adaptor sequence. In someembodiments, said adaptor is added by a ligation step. In someembodiments, the plurality of target sequences are mRNAs. In someembodiments, the methods further comprise synthesizing a plurality ofsecond strand polynucleotides using the plurality of first strandpolynucleotides as templates. In some embodiments, synthesizing theplurality of second strand polynucleotides comprises nicking theplurality of mRNAs with an RNase, thereby generating one or more mRNAprimers. In some embodiments, said RNase is RNaseH. In some embodiments,at least one of said one or more mRNA primers is at least 15 nucleotidesin length. In some embodiments, the methods further comprise extendingsaid one or more mRNA primers with a polymerase, thereby generatingextended segments. In some embodiments, said polymerase has 5′-3′exonuclease activity. In some embodiments, said polymerase comprises DNAPol I. In some embodiments, the methods further comprise ligating saidextended segments with a ligase, thereby generating a second strandpolynucleotide. In some embodiments, the methods further compriseextending said one or more mRNA primers with a strand displacingpolymerase, thereby generating an extended second strand. In someembodiments, the methods further comprise removing said one or more mRNAprimers from the extended second strand. In some embodiments, theplurality of amplicons comprises a whole transcriptome amplification(WTA) product. In some embodiments, the WTA product comprises at least10% of the mRNAs in the single cell. In some embodiments, the WTAproduct comprises at least 50% of the mRNAs in the single cell. In someembodiments, the WTA product comprises at least 90% of the mRNAs in thesingle cell. In some embodiments, each of the plurality of ampliconscomprises a stochastic barcode. In some embodiments, said stochasticbarcode comprises a molecular label, a cellular label, a target-specificregion, or any combination thereof. In some embodiments, the methodsfurther comprise sequencing the plurality of amplicons to generate aplurality of sequencing reads comprising a molecular label, a cellularlabel, a target-specific region, or any combination thereof. In someembodiments, the methods further comprise analyzing the plurality ofsequencing reads using the cellular label. In some embodiments, themethods further comprise analyzing the plurality of sequencing readsusing the molecular label. In some embodiments, said partition is amicrowell.

Some embodiments disclosed herein provide systems for generating a wholetranscriptome amplification (WTA) product from a plurality of singlecells comprising: a substrate comprising a plurality of partitions eachcomprising a single cell and a solid support immobilized with aplurality of nucleic acids, wherein each of the plurality of nucleicacids comprises: a first universal label; a cellular label; and amolecular label; an nucleic acid adaptor comprising a second universallabel sequence; and an enzyme, wherein the enzyme is a ligase or atransposase. In some embodiments, the substrate is a microwell array. Insome embodiments, said plurality of cells comprises a one or moredifferent cell types. In some embodiments, said one or more cell typesare selected from the group consisting of: brain cells, heart cells,cancer cells, circulating tumor cells, organ cells, epithelial cells,metastatic cells, benign cells, primary cells, and circulatory cells, orany combination thereof.

In one aspect, the disclosure provides for a composition comprising: aquasi-symmetric stochastically barcoded nucleic acid comprising: astochastic barcode sequence comprising a first universal label; a secondstrand synthesis primer sequence comprising a second universal label,wherein the second universal label is at most 99% identical to the firstuniversal label. In some embodiments, the quasi-symmetric stochasticallybarcoded nucleic acid is capable of undergoing suppression PCR. In someembodiments, the strand synthesis primer sequence further comprises arestriction site. In some embodiments, the second universal label is asubset of the first universal label. In some embodiments, the seconduniversal label is shorter than the first universal label. In someembodiments, the second universal label is shorter than the firstuniversal label by at least 1 nucleotide. In some embodiments, thesecond universal label is shorter than the first universal label by atleast 2 nucleotides. In some embodiments, the at least 2 nucleotides arelocated at the 5′ end of the second universal label. In someembodiments, the at least 2 nucleotides are located at the 3′ end of thesecond universal label. In some embodiments, the at least 2 nucleotidesare located in between the 3′ and 5′ end the second universal label. Insome embodiments, the second universal label comprises a mismatchcompared to the first universal label. In some embodiments, the seconduniversal label hybridizes to at least 80% of the first universal label.In some embodiments, the second universal label is at most 99% identicalto the first universal label over at least 90% of the length of thefirst universal label. In some embodiments, the second universal labelis not identical to the first universal label. In some embodiments, thestochastic barcode comprises a target binding region, a molecular label,a cellular label, and a universal label, or any combination thereof. Insome embodiments, the target-binding region comprises a sequenceselected from the group consisting of: oligo dT, a random multimer, anda gene-specific sequence. In some embodiments, the first universal labelis a first sequencing read primer sequence. In some embodiments, thewhole transcriptome amplification tag further comprises a sequencecomplementary to a homopolymer tail, a gene-specific sequence, or arandom multimer. In some embodiments, the stochastically barcodednucleic acid comprises a homopolymer tail. In some embodiments, thesecond strand synthesis primer sequence further comprises a restrictionendonuclease cleavage site. In some embodiments, the first universallabel is at one end of the quasi-symmetric stochastically barcodednucleic acid and wherein the second universal label is at another end ofthe quasi-symmetric stochastically barcoded nucleic acid. In someembodiments, the first universal label is at the 3′ end of thequasi-symmetric stochastically barcoded nucleic acid and the seconduniversal label is at the 5′ end of the quasi-symmetric stochasticallybarcoded nucleic acid. In some embodiments, the quasi-symmetricstochastically barcoded nucleic acid is single stranded. In someembodiments, the quasi-symmetric stochastically barcoded nucleic acid isdouble-stranded.

In one aspect, the disclosure provides for a method for breakingsymmetry in a barcoded nucleic acid comprising: generating aquasi-symmetric stochastically barcoded nucleic acid, comprising astochastic barcode sequence, a first universal label and a seconduniversal label, wherein the second universal label is at most 99%identical to the first universal label; and identifying 3′ and 5′sequencing reads from the quasi-symmetric stochastically barcodednucleic acid, thereby breaking the symmetry of the quasi-symmetricstochastically barcoded nucleic acid. In some embodiments, thegenerating comprises contacting a target RNA with a stochastic barcode.In some embodiments, the stochastic barcode comprises a cellular label,a molecular label, the first universal label, and a target-bindingregion, or any combination thereof. In some embodiments, the methodfurther comprises reverse transcribing the stochastic barcode, therebygenerating a stochastically labelled cDNA comprising a complementarysequence of the target RNA. In some embodiments, the method furthercomprises appending a 3′ homopolymer tail to the stochastically labelledcDNA. In some embodiments, the 3′ homopolymer tail is from 2-10nucleotides in length. In some embodiments, the 3′ homopolymer tail is apoly A tail. In some embodiments, the method further comprisesperforming second strand synthesis with a second strand synthesis primercomprising a sequence complementary to the homopolymer tail and thesecond universal label, thereby generating the quasi-symmetricstochastically barcoded nucleic acid. In some embodiments, the sequencecomplementary to the homopolymer tail comprises a poly T sequence. Insome embodiments, the sequence complementary to the homopolymer tailcomprises a poly U sequence. In some embodiments, the primer furthercomprises a restriction endonuclease cleavage site. In some embodiments,the method further comprises amplifying the quasi-symmetricstochastically barcoded nucleic acid with a whole transcriptomeamplification primer. In some embodiments, the whole transcriptomeamplification primer hybridizes to a portion of the second universallabel. In some embodiments, the whole transcriptome amplification primerhybridizes to at least 18 nucleotides of the second universal label. Insome embodiments, the whole transcriptome amplification primerhybridizes to at most 18 nucleotides of the second universal label. Insome embodiments, the whole transcriptome amplification primerhybridizes to about 18 nucleotides from the 5′ end of the seconduniversal label. In some embodiments, further comprising cleaving therestriction endonuclease cleavage site with a restriction endonuclease,thereby generating an asymmetric stochastically barcoded nucleic acid.In some embodiments, the asymmetric stochastically barcoded nucleic aciddoes not comprise the second universal label. In some embodiments, thegenerating the 3′ and 5′ sequencing reads comprises contacting theasymmetric stochastically barcoded nucleic acid with a degenerateprimer, thereby generating an asymmetric read product. In someembodiments, the degenerate primer comprises a gene-specific sequence, arandom-multimer sequence, and a third universal label, or anycombination thereof. In some embodiments, the third universal label is asequencing primer binding site. In some embodiments, the third universallabel is different than the first universal label. In some embodiments,the method further comprises amplifying the asymmetric read product withlibrary amplification primers. In some embodiments, a primer of thelibrary amplification primer does not bind to the second universallabel. In some embodiments, the library amplification primers bind tothe first universal label and the third universal label. In someembodiments, the method further comprises performing second strandsynthesis with a degenerate primer comprising the second universal labeland a random multimer sequence, thereby generating the quasi-symmetricstochastically barcoded nucleic acid. In some embodiments, thequasi-symmetric stochastically barcoded nucleic acid is single-stranded.In some embodiments, the method further comprises amplifying thequasi-symmetric stochastically barcoded nucleic acid with a wholetranscriptome amplification primer. In some embodiments, the wholetranscriptome amplification primer hybridizes to a portion of the seconduniversal label. In some embodiments, the whole transcriptomeamplification primer hybridizes to at least 18 nucleotides of the seconduniversal label. In some embodiments, the whole transcriptomeamplification primer hybridizes to at most 18 nucleotides of the seconduniversal label. In some embodiments, the whole transcriptomeamplification primer hybridizes to about 18 nucleotides from the 5′ endof the second universal label. In some embodiments, the identifyingcomprises sequencing the 3′ and 5′ sequencing reads. In someembodiments, the 3′ sequencing reads comprise a stochastic barcodesequence. In some embodiments, the 3′ sequencing reads further comprisethe first universal label, a cellular label, and a portion of thesequence of the nucleic acid, or any combination thereof. In someembodiments, the 5′ sequencing reads comprise a restriction endonucleasecleavage site, and the second universal label, or any combinationthereof. In some embodiments, the number of 3′ sequencing reads is atleast 2-fold the number of 5′ sequence reads. In some embodiments, thenumber of 5′ sequencing reads is less than 1,000. In some embodiments,the generating comprises adding a cleavage site and the second universallabel through template switching. In some embodiments, the generatingcomprises adding a cleavage site and the second universal label throughin vitro transcription. In some embodiments, the identifying sequencingreads comprises determining the sequence of a portion of the sequence ofthe stochastic barcode and the nucleic acid. In some embodiments, thesequencing reads are at least 25 nucleotides in length. In someembodiments, the sequencing reads are at least 75 nucleotides in length.In some embodiments, the barcoded nucleic acid is from a sample. In someembodiments, the sample comprises a single cell. In some embodiments,the sample comprises a plurality of cells. In some embodiments, theplurality of cells comprises a one or more different cell types. In someembodiments, the one or more cell types are selected from the groupconsisting of: brain cells, heart cells, cancer cells, circulating tumorcells, organ cells, epithelial cells, metastatic cells, benign cells,primary cells, and circulatory cells, or any combination thereof. Insome embodiments, the sample comprises a solid tissue. In someembodiments, the sample is obtained from a subject. In some embodiments,the subject is a subject selected from the group consisting of: a human,a mammal, a dog, a rat, a mouse, a fish, a fly, a worm, a plant, afungi, a bacterium, a virus, a vertebrate, and an invertebrate. In someembodiments, the targets are deoxyribonucleic acid molecules. In someembodiments, the method further comprises isolating a single cell and asingle bead into a plurality of wells on a substrate, wherein a singlewell of the plurality of wells has the single cell and the single bead.In some embodiments, the substrate comprises at least 1,000 wells. Insome embodiments, the single bead comprises a plurality of stochasticbarcodes. In some embodiments, the method further comprises hybridizingtargets from the single cell to the stochastic barcodes. In someembodiments, individual stochastic barcodes of the plurality ofstochastic barcodes have different molecular labels. In someembodiments, individual stochastic barcodes of the plurality ofstochastic barcodes have the same cellular label. In some embodiments,individual stochastic barcodes of the plurality of stochastic barcodescomprise a molecular label, a cellular label, the first universal label,and a target-binding region, or any combination thereof. In someembodiments, stochastic barcodes in different wells of the plurality ofwells have different cellular labels. In some embodiments, the methodfurther comprises removing the bead. In some embodiments, the removingis performed with a magnet.

In one aspect, the disclosure provides for a kit comprising: a set ofstochastic barcodes, wherein each stochastic barcode of the set ofstochastic barcodes comprise a target-specific region; a molecularlabel; a cellular label; and a first universal label; a second strandsynthesis primer comprising a second universal label, wherein the seconduniversal label is at most 99% identical to the first universal label;and an enzyme. In some embodiments, individual stochastic barcodes ofthe set of stochastic barcodes comprise different molecular labels. Insome embodiments, individual stochastic barcodes of the set ofstochastic barcodes comprise the same cellular labels. In someembodiments, the target-specific region comprises a sequence selectedfrom the group consisting of: an oligo dT, a random multimer, and agene-specific sequence. In some embodiments, the enzyme is selected fromthe group consisting of: a reverse transcriptase, a terminaltransferase, an RNase inhibitor, a DNA polymerase, a restrictionendonuclease, and an exonuclease, or any combination thereof. In someembodiments, the kit further comprises a whole transcriptomeamplification primer. In some embodiments, the whole transcriptomeamplification primer hybridizes to a portion of the second universallabel. In some embodiments, the whole transcriptome amplification primerhybridizes to at least 18 nucleotides of the second universal label. Insome embodiments, the whole transcriptome amplification primerhybridizes to at most 18 nucleotides of the second universal label. Insome embodiments, the whole transcriptome amplification primerhybridizes to about 18 nucleotides from the 5′ end of the seconduniversal label. In some embodiments, the second universal label differsby at least one nucleotide from the first universal label. In someembodiments, the second universal label is shorter than the firstuniversal label by at least one nucleotide. In some embodiments, thesecond universal label is shorter than the first universal label by atleast two nucleotides. In some embodiments, the second universal labelis shorter than the first universal label by two nucleotides. In someembodiments, the second universal label comprises a sequence that is asubset of the sequence of the first universal label. In someembodiments, the second strand synthesis primer comprises a cleavagesite. In some embodiments, the cleavage site comprises a restrictionendonuclease cleavage site. In some embodiments, the kit furthercomprises instructions for use. In some embodiments, the kit furthercomprises one or more universal primers. In some embodiments, the one ormore universal primers are adapted to hybridize to the first and seconduniversal label, the first universal label, or the second universallabel. In some embodiments, the kit further comprises a set ofgene-specific primers. In some embodiments, the kit further comprisesreagents for a reverse transcription reaction. In some embodiments, thekit further comprises reagents for a polymerase chain reaction. In someembodiments, the kit further comprises reagents for a restrictionendonuclease cleavage reaction. In some embodiments, the set ofstochastic barcodes are attached to a solid support. In someembodiments, the solid support comprises a bead. In some embodiments,the kit further comprises a substrate. In some embodiments, thesubstrate comprises microwells.

In one aspect, the disclosure provides for a method for wholetranscriptome amplification using adaptor ligation comprising:contacting one or more mRNA targets a sample with a nucleic acidcomprising a first universal label; performing reverse transcription andsecond strand synthesis, thereby generating a labeled cDNA; ligating anadaptor to labeled cDNA, thereby generating a quasi-symmetric cDNA,wherein the adaptor comprises a second universal label; and amplifyingthe adaptor ligated cDNA with a whole transcriptome amplificationprimer, thereby generating a whole-transcriptome amplified product. Insome embodiments, the nucleic acid comprises a stochastic barcode. Insome embodiments, the stochastic barcode comprises a molecular label, acellular label, a target-specific region, or any combination thereof. Insome embodiments, the target-specific region comprises an oligo dTsequence. In some embodiments, the second strand synthesis comprisesnicking the mRNA with an RNAse, thereby generating mRNA primers. In someembodiments, the RNase is RNaseH. In some embodiments, the mRNA primersare at least 15 nucleotides in length. In some embodiments, the methodfurther comprises extending the mRNA primers with a polymerase, therebygenerating extended segments. In some embodiments, the polymerasecomprises 5′-3′ exonuclease activity. In some embodiments, thepolymerase comprises DNA Pol I. In some embodiments, the method furthercomprises ligating the extended segments with a ligase, therebygenerating a second strand. In some embodiments, the adaptor is doublestranded. In some embodiments, the adaptor comprises a restrictionendonuclease cleavage site. In some embodiments, the second universallabel is a subset of the first universal label. In some embodiments, thesecond universal label is shorter than the first universal label. Insome embodiments, the second universal label is shorter than the firstuniversal label by at least 1 nucleotide. In some embodiments, thesecond universal label is shorter than the first universal label by atleast 2 nucleotides. In some embodiments, the at least 2 nucleotides arelocated at the 5′ end of the second universal label. In someembodiments, the at least 2 nucleotides are located at the 3′ end of thesecond universal label. In some embodiments, the at least 2 nucleotidesare located in between the 3′ and 5′ end the second universal label. Insome embodiments, the second universal label comprises a mismatchcompared to the first universal label. In some embodiments, the seconduniversal label hybridizes to at least 80% of the first universal label.In some embodiments, the second universal label is at most 99% identicalto the first universal label over at least 90% of the length of thefirst universal label. In some embodiments, the second universal labelis not identical to the first universal label. In some embodiments, theligating comprises ligating the adaptor to both strands of the labeledcDNA. In some embodiments, the adaptor comprises a single 5′phosphorylation site. In some embodiments, after the ligating, thequasi-symmetric cDNA comprises the same sequence at the 3′ end of eachstrand. In some embodiments, the same sequence comprises a seconduniversal primer sequence. In some embodiments, the whole transcriptomeamplification primer comprises a sequence complementary to the seconduniversal primer sequence on the adaptor. In some embodiments, the wholetranscriptome amplification primer comprises a sequence complementary tothe first universal primer sequence on the adaptor. In some embodiments,the amplifying comprises suppressive PCR. In some embodiments, theamplifying comprises semi-suppressive PCR. In some embodiments, theamplifying comprises linear amplification of one strand and exponentialamplification of the other strand of the quasi-symmetric cDNA. In someembodiments, the method is not gene-specific. In some embodiments, theamplifying is not gene-specific. In some embodiments, at least 5% of themRNA targets are amplified. In some embodiments, at least 10% of themRNA targets are amplified. In some embodiments, at least 15% of themRNA targets are amplified. In some embodiments, the method furthercomprises sequencing the whole transcriptome amplified product. In someembodiments, the method further comprises counting the number ofmolecules of the mRNA targets. In some embodiments, the countingcomprises counting the number of unique molecular labels for each wholetranscriptome amplified product with the sequence of the mRNA target.

In one aspect, the disclosure provides for a method for second strandsynthesis comprising: contacting an mRNA target in a sample with anucleic acid comprising a first universal label; reverse transcribingthe mRNA target into a labeled single stranded cDNA; synthesizing asecond strand with a strand displacing polymerase off the labeledsingle-stranded cDNA; and generating a third strand from the secondstrand with a sequence comprising the first universal label, therebygenerating a double-stranded labeled cDNA. In some embodiments, thenucleic acid comprises a stochastic barcode. In some embodiments, thestochastic barcode comprises a molecular label, a cellular label, atarget-specific region, and a universal label, or any combinationthereof. In some embodiments, the contacting comprises hybridizing atarget-specific region of the nucleic acid to the mRNA. In someembodiments, the target-specific region comprises an oligo dT sequence.In some embodiments, after the reverse transcribing the labeled singlestranded cDNA is hybridized to the mRNA target. In some embodiments, thesynthesizing comprises nicking the mRNA with an RNAse, therebygenerating mRNA primers. In some embodiments, the RNase is RNaseH. Insome embodiments, the mRNA primers are at least 15 nucleotides inlength. In some embodiments, the method further comprises extending themRNA primers with the strand displacing polymerase, thereby generatingextended segments. In some embodiments, the extended segments comprisethe mRNA primer. In some embodiments, the method further comprisesremoving the mRNA primer. In some embodiments, the removing comprisesremoving with an exonuclease. In some embodiments, the generatingcomprises extending the first universal label to incorporate thesequence of the second strand. In some embodiments, the method furthercomprises blunting the end of the double-stranded cDNA. In someembodiments, the method further comprises adding an A overhang to thedouble-stranded cDNA. In some embodiments, the method further comprisesligating an adaptor to the double-stranded cDNA, thereby generating aquasi-symmetric cDNA. In some embodiments, the adaptor is doublestranded. In some embodiments, the adaptor comprises a restrictionendonuclease cleavage site. In some embodiments, the adaptor comprises asecond universal label. In some embodiments, the second universal labelis a subset of the first universal label. In some embodiments, thesecond universal label is shorter than the first universal label. Insome embodiments, the second universal label is shorter than the firstuniversal label by at least 1 nucleotide. In some embodiments, thesecond universal label is shorter than the first universal label by atleast 2 nucleotides. In some embodiments, the at least 2 nucleotides arelocated at the 5′ end of the second universal label. In someembodiments, the at least 2 nucleotides are located at the 3′ end of thesecond universal label. In some embodiments, the at least 2 nucleotidesare located in between the 3′ and 5′ end the second universal label. Insome embodiments, the second universal label comprises a mismatchcompared to the first universal label. In some embodiments, the seconduniversal label hybridizes to at least 80% of the first universal label.In some embodiments, the second universal label is at most 99% identicalto the first universal label over at least 90% of the length of thefirst universal label. In some embodiments, the second universal labelis not identical to the first universal label. In some embodiments, theligating comprises ligating the adaptor to both strands of the labeledcDNA. In some embodiments, after the ligating, the adaptor ligated cDNAcomprises the same sequence at the 3′ end of each strand. In someembodiments, the same sequence comprises a second universal primersequence. In some embodiments, the method further comprises amplifyingthe quasi-symmetric cDNA with a whole transcriptome amplificationprimer. In some embodiments, the whole transcriptome amplificationprimer comprises a sequence complementary to a second universal primersequence on the adaptor. In some embodiments, the amplifying comprisessuppressive PCR. In some embodiments, the amplifying comprisessemi-suppressive PCR. In some embodiments, the amplifying compriseslinear amplification of one strand and exponential amplification of theother strand of the adaptor ligated cDNA. In some embodiments, themethod is not gene-specific. In some embodiments, the amplifying is notgene-specific.

In one aspect the disclosure provides for a kit comprising: a set ofstochastic barcodes, wherein each stochastic barcode of the set ofstochastic barcodes comprise a target-specific region; a molecularlabel; a cellular label; and a first universal label; an adaptorcomprising a second universal label, wherein the second universal labelis at most 99% identical to the first universal label; and an enzyme. Insome embodiments, individual stochastic barcodes of the set ofstochastic barcodes comprise different molecular labels. In someembodiments, individual stochastic barcodes of the set of stochasticbarcodes comprise the same cellular labels. In some embodiments, thetarget-specific region comprises a sequence selected from the groupconsisting of: an oligo dT, a random multimer, and a gene-specificsequence. In some embodiments, the enzyme is selected from the groupconsisting of: a reverse transcriptase, a terminal transferase, an RNaseinhibitor, a DNA polymerase, an RNase, a restriction endonuclease, andan exonuclease, or any combination thereof. In some embodiments, the kitfurther comprises a whole transcriptome amplification primer. In someembodiments, the whole transcriptome amplification primer hybridizes toa portion of the second universal label. In some embodiments, the wholetranscriptome amplification primer hybridizes to at least 10 nucleotidesof the first universal label. In some embodiments, the wholetranscriptome amplification primer hybridizes to at most 10 nucleotidesof the first universal label. In some embodiments, the wholetranscriptome amplification primer hybridizes to about 10 nucleotidesfrom the 5′ end of the first universal label. In some embodiments, thesecond universal label differs by at least one nucleotide from the firstuniversal label. In some embodiments, the second universal label isshorter than the first universal label by at least one nucleotide. Insome embodiments, the second universal label is shorter than the firstuniversal label by at least two nucleotides. In some embodiments, thesecond universal label is shorter than the first universal label by twonucleotides. In some embodiments, the second universal label comprises asequence that is a subset of the sequence of the first universal label.In some embodiments, the adaptor comprises a cleavage site. In someembodiments, the cleavage site comprises a restriction endonucleasecleavage site. In some embodiments, the kit further comprisesinstructions for use. In some embodiments, the kit further comprises oneor more universal primers. In some embodiments, the one or moreuniversal primers are adapted to hybridize to the first and seconduniversal label, the first universal label, or the second universallabel. In some embodiments, the kit further comprises reagents for areverse transcription reaction. In some embodiments, the kit furthercomprises reagents for a polymerase chain reaction. In some embodiments,the kit further comprises reagents for a restriction endonucleasecleavage reaction. In some embodiments, the set of stochastic barcodesare attached to a solid support. In some embodiments, the solid supportcomprises a bead. In some embodiments, the kit further comprises asubstrate. In some embodiments, the substrate comprises microwells.

In one aspect, the disclosure provides for a method for generating astrand specific sequencing library comprising: contacting an RNAfragment with a primer comprising a first sequence; generating adouble-stranded cDNA that incorporates the first sequence; ligating anadaptor to the double-stranded cDNA, thereby generating an asymmetricadaptor ligated cDNA, wherein the sequencing reads of the asymmetricadaptor ligated cDNA indicate strand specificity. In some embodiments,the first sequence comprises a portion of a first sequencing primersequence. In some embodiments, the first sequence comprises a samplebarcode. In some embodiments, the first sequence comprises a molecularbarcode. In some embodiments, the generating comprises reversetranscription, second strand synthesis. In some embodiments, the adaptoris double-stranded. In some embodiments, the adaptor ligates to the 3′end, the 5′ end, or both the 3′ and 5′ end of the double-stranded cDNA.In some embodiments, the adaptor comprises a second sequence. In someembodiments, the second sequence comprises a portion of a secondsequencing primer sequence. In some embodiments, the second sequencecomprises a sample barcode. In some embodiments, the method furthercomprises amplifying the asymmetric adaptor ligated cDNA, therebygenerating asymmetric amplicons. In some embodiments, the amplifyingcomprises PCR amplification with a first primer and a second primer. Insome embodiments, the first primer hybridizes to at least a portion ofthe first sequence. In some embodiments, the second primer hybridizes toat least a portion of a second sequence. In some embodiments, the firstand second primers comprise additional sequences to be added to theadaptor ligated cDNA. In some embodiments, the additional sequencescomprise additional flow cell sequences. In some embodiments, theadditional sequences comprise additional sequencing primer sequences. Insome embodiments, the additional sequences comprise sample barcodes. Insome embodiments, the method further comprises incorporating a universalsequence into the RNA. In some embodiments, the incorporating comprisesusing a PolyA polymerase. In some embodiments, the some of the readsincorporate the sequence of the adaptor. In some embodiments, the readscorrespond to a first strand of the double-stranded cDNA. In someembodiments, the some of the reads incorporate the sequence of the firstsequence. In some embodiments, the reads correspond to a second strandof the double-stranded cDNA. In some embodiments, the strand specificityis determined by whether a read of the sequencing reads comprises thesequence of the adaptor or the first sequence. In some embodiments, theRNA is selected from the group consisting of: an mRNA, a non-coding RNA,a lncRNA, a miRNA, a double-stranded RNA, and a single-stranded RNA. Insome embodiments, the method does not comprise degrading one strand ofthe double-stranded cDNA. In some embodiments, the method furthercomprises sequencing the amplicon.

In one aspect the disclosure provides for a method for generating astrand-specific sequencing library comprising: contacting a DNA fragmentwith a primer comprising a first sequence; extending the primer, therebygenerating a copy of the DNA sequence; and ligating an adaptor to oneend of the copy of the DNA sequence, wherein the adaptor comprises asecond sequence, thereby generating an asymmetric adaptor ligated cDNA,wherein the sequencing reads of the asymmetric adaptor ligated cDNAindicate strand specificity. In some embodiments, the first sequencecomprises a portion of a first sequencing primer sequence. In someembodiments, the first sequence comprises a sample barcode. In someembodiments, the first sequence comprises a molecular barcode. In someembodiments, the generating comprises reverse transcription, secondstrand synthesis. In some embodiments, the adaptor is double-stranded.In some embodiments, the adaptor ligates to the 3′ end, the 5′ end, orboth the 3′ and 5′ end of the double-stranded cDNA. In some embodiments,the second sequence comprises a portion of a second sequencing primersequence. In some embodiments, the second sequence comprises a samplebarcode. In some embodiments, the method further comprises amplifyingthe asymmetric adaptor ligated cDNA, thereby generating asymmetricamplicons. In some embodiments, the amplifying comprises PCRamplification with a first primer and a second primer. In someembodiments, the first primer hybridizes to at least a portion of thefirst sequence. In some embodiments, the second primer hybridizes to atleast a portion of a second sequence. In some embodiments, the first andsecond primers comprise additional sequences to be added to the adaptorligated cDNA. In some embodiments, the additional sequences compriseadditional flow cell sequences. In some embodiments, the additionalsequences comprise additional sequencing primer sequences. In someembodiments, the additional sequences comprise sample barcodes. In someembodiments, the method further comprises incorporating a universalsequence into the RNA. In some embodiments, the incorporating comprisesusing a terminal transferase. In some embodiments, the some of the readsincorporate the sequence of the adaptor. In some embodiments, the readscorrespond to a first strand of the double-stranded cDNA. In someembodiments, the some of the reads incorporate the sequence of the firstsequence. In some embodiments, the reads correspond to a second strandof the double-stranded cDNA. In some embodiments, the strand specificityis determined by whether a read of the sequencing reads comprises thesequence of the adaptor or the first sequence. In some embodiments, themethod does not comprise degrading one strand of the double-strandedcDNA.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth with particularity inthe appended claims. A better understanding of the features andadvantages of the present invention will be obtained by reference to thefollowing detailed description that sets forth illustrative embodiments,in which the principles of the invention are utilized, and theaccompanying drawings of which:

FIGS. 1A and 1B illustrate an exemplary embodiment of the homopolymertailing method of the disclosure.

FIGS. 2A and 2B illustrate an exemplary embodiment for breaking symmetryof a quasi-symmetric stochastically barcode nucleic acid.

FIGS. 3A, 3B, and 3C illustrate an exemplary embodiment for the randompriming method of the disclosure.

FIG. 4 depicts an exemplary stochastic barcode of the disclosure.

FIG. 5 depicts a schematic of an exemplary embodiment of the stochasticbarcoding method of the disclosure.

FIGS. 6A, 6B and 6C depict the efficiency and mass yield of an exemplaryadaptor ligation method used in Example 1.

FIG. 7 depicts the nucleic acid size distribution of the WTA productusing an exemplary adaptor ligation method used in Example 2.

FIGS. 8A and 8B depict the efficiency of exemplary methods used inExamples 2 and 3. FIG. 8A shows efficiency using the second strandsynthesis method as described in FIGS. 15A and 15B. FIG. 8B showsefficiency using the nicking and extension method as described in FIGS.16A and 16B.

FIGS. 9A and 9B depict (A) an agarose gel and (B) mass yield of theproduct in FIG. 8B.

FIG. 10 depicts a bioanalyzer trace of the WTA product using anexemplary homopolymer tailing method in Example 4.

FIG. 11 depicts a bioanalyzer trace of the sequencing library made fromthe WTA product made with an exemplary homopolymer tailing method inExample 5.

FIG. 12 depicts percent of read mapping with exemplary restrictionenzyme digestion methods in Example 6.

FIGS. 13A and 13B depict fastQC data of the base content of a sampleusing an exemplary adaptor ligation method in Example 6.

FIG. 14 shows the percent of read mapping with simultaneous use ofrestriction enzyme cleavage and a mismatched WTA primer (e.g., seconduniversal primer).

FIGS. 15A and 15B depict an exemplary embodiment of the adaptor ligationmethod of the disclosure.

FIGS. 16A and 16B depict an exemplary embodiment of the RNase primingmethod of the disclosure.

FIG. 17 depicts exemplary adaptors used in the methods of thedisclosure.

FIGS. 18A and 18B compare efficiency of an exemplary adaptor ligationmethod of Example 13 using 1 ng RNA per well (A) and 10 pg RNA/well (B).

FIG. 19 shows correlation between 1 ng/well or 10 pg/well total RNAusing an exemplary adaptor ligation method of Example 13.

FIGS. 20A and 20B are graphical representations of the data generated inFIGS. 18 and 19. FIG. 20A is a PCA plot that clearly separates the UHRR(Universal Human Reference RNA) wells from the Human Brain Reference RNA(HBRR) wells; FIG. 20B is a heatmap of the genes used for PCA along withhierarchical clustering that shows all of the like RNA types clusteringtogether.

FIG. 21 shows a schematic illustrating an exemplary embodiment of thewhole transcriptome amplification method of the disclosure usingtemplate switching.

FIGS. 22A and 22B show data demonstrating an exemplary adaptor ligationonto nucleic acids attached to solid supports method of Example 7.

FIG. 23 shows that the high suppression adaptor prevents primer-dimerformation.

FIGS. 24A and 24B show whole transcriptome amplification productgenerated from beads either alone (A) or in combination with cells (B).

FIGS. 25A and 25B show Bioanalyzer traces of the WTA product (A) andlibrary preparation (B) of the samples generated from FIG. 24.

FIG. 26 illustrates an exemplary embodiment of RNA-seq librarypreparation using the adaptor ligation method of the disclosure.

FIG. 27 illustrates an exemplary embodiment of DNA-seq librarypreparation using the adaptor ligation method of the disclosure.

FIGS. 28A and 28B show the effect of exonuclease treatment on the WTAadaptor ligation method of the disclosure.

FIGS. 29A and 29B show an exemplary experimental protocol for analyzing1 ng of RNA with the whole transcriptome methods of the disclosure.

FIGS. 30A and 30B show an exemplary experimental protocol for analyzing10 pg of RNA with the whole transcriptome methods of the disclosure.

FIGS. 31A-31D show a schematic illustrating an exemplary embodiment ofthe whole transcriptome amplification method of the disclosure usingadaptor ligation with beads.

FIGS. 32A-32E show a schematic illustrating an exemplary embodiment ofthe whole transcriptome amplification method of the disclosure usingtransposome-based fragmentation and ligation with beads.

FIGS. 33A, 33B, 33C and 33D show exemplary results of TWA analysis usingtransposome-based fragmentation and ligation with beads and adaptorligation with beads.

DETAILED DESCRIPTION Definitions

Unless otherwise defined, all technical terms used herein have the samemeaning as commonly understood by one of ordinary skill in the art inthe field to which this disclosure belongs. As used in thisspecification and the appended claims, the singular forms “a,” “an,” and“the” include plural references unless the context clearly dictatesotherwise. Any reference to “or” herein is intended to encompass“and/or” unless otherwise stated.

As used herein the term “transcriptome” refers to the set of alltranscripts, such as messenger RNA (mRNA) molecules, small interferingRNA (siRNA) molecules, transfer RNA (tRNA) molecules, ribosomal RNA(rRNA) molecules, in a sample, for example, a single cell or apopulation of cells. In some embodiments, transcriptome not only refersto the species of transcripts, such as mRNA species, but also the amountof each species in the sample. In some embodiments, a transcriptomeincludes each mRNA molecule in the sample, such as all the mRNAmolecules in a single cell.

As used herein the term “associated” or “associated with” can mean thattwo or more species are identifiable as being co-located at a point intime. An association can mean that two or more species are or werewithin a similar container. An association can be an informaticsassociation, where for example digital information regarding two or morespecies is stored and can be used to determine that one or more of thespecies were co-located at a point in time. An association can also be aphysical association. In some instances two or more associated speciesare “tethered”, “attached”, or “immobilized” to one another or to acommon solid or semisolid surface. An association may refer to covalentor non-covalent means for attaching labels to solid or semi-solidsupports such as beads. An association may comprise hybridizationbetween a target and a label.

As used herein, the term “complementary” can refer to the capacity forprecise pairing between two nucleotides. For example, if a nucleotide ata given position of a nucleic acid is capable of hydrogen bonding with anucleotide of another nucleic acid, then the two nucleic acids areconsidered to be complementary to one another at that position.Complementarity between two single-stranded nucleic acid molecules maybe “partial,” in which only some of the nucleotides bind, or it may becomplete when total complementarity exists between the single-strandedmolecules. A first nucleotide sequence can be said to be the“complement” of a second sequence if the first nucleotide sequence iscomplementary to the second nucleotide sequence. A first nucleotidesequence can be said to be the “reverse complement” of a secondsequence, if the first nucleotide sequence is complementary to asequence that is the reverse (i.e., the order of the nucleotides isreversed) of the second sequence. As used herein, the terms“complement”, “complementary”, and “reverse complement” can be usedinterchangeably. It is understood from the disclosure that if a moleculecan hybridize to another molecule it may be the complement of themolecule that is hybridizing.

As used herein, the term “digital counting” can refer to a method forestimating a number of target molecules in a sample. Digital countingcan include the step of determining a number of unique labels that havebeen associated with targets in a sample. This stochastic methodologytransforms the problem of counting molecules from one of locating andidentifying identical molecules to a series of yes/no digital questionsregarding detection of a set of predefined labels.

As used herein, the term “first universal label” can refer to a labelthat is universal for barcodes of the disclosure. A first universallabel can be a sequencing primer binding site (e.g., a read primerbinding site, i.e., for an Illumina sequencer).

As used herein, the term “label” or “labels” can refer to nucleic acidcodes associated with a target within a sample. A label can be, forexample, a nucleic acid label. A label can be an entirely or partiallyamplifiable label. A label can be entirely or partially sequencablelabel. A label can be a portion of a native nucleic acid that isidentifiable as distinct. A label can be a known sequence. A label cancomprise a junction of nucleic acid sequences, for example a junction ofa native and non-native sequence. As used herein, the term “label” canbe used interchangeably with the terms, “index”, “tag,” or “label-tag.”Labels can convey information. For example, in various embodiments,labels can be used to determine an identity of a sample, a source of asample, an identity of a cell, and/or a target.

As used herein, the term “non-depleting reservoirs” can refer to a poolof stochastic barcodes made up of many different labels. A non-depletingreservoir can comprise large numbers of different stochastic barcodessuch that when the non-depleting reservoir is associated with a pool oftargets each target is likely to be associated with a unique stochasticbarcode. The uniqueness of each labeled target molecule can bedetermined by the statistics of random choice, and depends on the numberof copies of identical target molecules in the collection compared tothe diversity of labels. The size of the resulting set of labeled targetmolecules can be determined by the stochastic nature of the barcodingprocess, and analysis of the number of stochastic barcodes detected thenallows calculation of the number of target molecules present in theoriginal collection or sample. When the ratio of the number of copies ofa target molecule present to the number of unique stochastic barcodes islow, the labeled target molecules are highly unique (i.e. there is avery low probability that more than one target molecule will have beenlabeled with a given label).

As used herein, a “nucleic acid” can generally refer to a polynucleotidesequence, or fragment thereof. A nucleic acid can comprise nucleotides.A nucleic acid can be exogenous or endogenous to a cell. A nucleic acidcan exist in a cell-free environment. A nucleic acid can be a gene orfragment thereof. A nucleic acid can be DNA. A nucleic acid can be RNA.A nucleic acid can comprise one or more analogs (e.g. altered backgone,sugar, or nucleobase). Some non-limiting examples of analogs include:5-bromouracil, peptide nucleic acid, xeno nucleic acid, morpholinos,locked nucleic acids, glycol nucleic acids, threose nucleic acids,dideoxynucleotides, cordycepin, 7-deaza-GTP, florophores (e.g. rhodamineor flurescein linked to the sugar), thiol containing nucleotides, biotinlinked nucleotides, fluorescent base analogs, CpG islands,methyl-7-guanosine, methylated nucleotides, inosine, thiouridine,pseudourdine, dihydrouridine, queuosine, and wyosine. “Nucleic acid”,“polynucleotide, “target polynucleotide”, and “target nucleic acid” canbe used interchangeably.

A nucleic acid can comprise one or more modifications (e.g., a basemodification, a backbone modification), to provide the nucleic acid witha new or enhanced feature (e.g., improved stability). A nucleic acid cancomprise a nucleic acid affinity tag. A nucleoside can be a base-sugarcombination. The base portion of the nucleoside can be a heterocyclicbase. The two most common classes of such heterocyclic bases are thepurines and the pyrimidines. Nucleotides can be nucleosides that furtherinclude a phosphate group covalently linked to the sugar portion of thenucleoside. For those nucleosides that include a pentofuranosyl sugar,the phosphate group can be linked to the 2′, the 3′, or the 5′ hydroxylmoiety of the sugar. In forming nucleic acids, the phosphate groups cancovalently link adjacent nucleosides to one another to form a linearpolymeric compound. In turn, the respective ends of this linearpolymeric compound can be further joined to form a circular compound;however, linear compounds are generally suitable. In addition, linearcompounds can have internal nucleotide base complementarity and cantherefore fold in a manner as to produce a fully or partiallydouble-stranded compound. Within nucleic acids, the phosphate groups cancommonly be referred to as forming the internucleoside backbone of thenucleic acid. The linkage or backbone of the nucleic acid can be a 3′ to5′ phosphodiester linkage.

A nucleic acid can comprise a modified backbone and/or modifiedinternucleoside linkages. Modified backbones can include those thatretain a phosphorus atom in the backbone and those that do not have aphosphorus atom in the backbone. Suitable modified nucleic acidbackbones containing a phosphorus atom therein can include, for example,phosphorothioates, chiral phosphorothioates, phosphorodithioates,phosphotriesters, aminoalkylphosphotriesters, methyl and other alkylphosphonates such as 3′-alkylene phosphonates, 5′-alkylene phosphonates,chiral phosphonates, phosphinates, phosphoramidates including 3′-aminophosphoramidate and aminoalkylphosphoramidates, phosphorodiamidates,thionophosphoramidates, thionoalkylphosphonates,thionoalkylphosphotriesters, selenophosphates, and boranophosphateshaving normal 3′-5′ linkages, 2′-5′ linked analogs, and those havinginverted polarity wherein one or more internucleotide linkages is a 3′to 3′, a 5′ to 5′ or a 2′ to 2′ linkage.

A nucleic acid can comprise polynucleotide backbones that are formed byshort chain alkyl or cycloalkyl internucleoside linkages, mixedheteroatom and alkyl or cycloalkyl internucleoside linkages, or one ormore short chain heteroatomic or heterocyclic internucleoside linkages.These can include those having morpholino linkages (formed in part fromthe sugar portion of a nucleoside); siloxane backbones; sulfide,sulfoxide and sulfone backbones; formacetyl and thioformacetylbackbones; methylene formacetyl and thioformacetyl backbones; riboacetylbackbones; alkene containing backbones; sulfamate backbones;methyleneimino and methylenehydrazino backbones; sulfonate andsulfonamide backbones; amide backbones; and others having mixed N, O, Sand CH2 component parts.

A nucleic acid can comprise a nucleic acid mimetic. The term “mimetic”can be intended to include polynucleotides wherein only the furanosering or both the furanose ring and the internucleotide linkage arereplaced with non-furanose groups, replacement of only the furanose ringcan also be referred as being a sugar surrogate. The heterocyclic basemoiety or a modified heterocyclic base moiety can be maintained forhybridization with an appropriate target nucleic acid. One such nucleicacid can be a peptide nucleic acid (PNA). In a PNA, the sugar-backboneof a polynucleotide can be replaced with an amide containing backbone,in particular an aminoethylglycine backbone. The nucleotides can beretained and are bound directly or indirectly to aza nitrogen atoms ofthe amide portion of the backbone. The backbone in PNA compounds cancomprise two or more linked aminoethylglycine units which gives PNA anamide containing backbone. The heterocyclic base moieties can be bounddirectly or indirectly to aza nitrogen atoms of the amide portion of thebackbone.

A nucleic acid can comprise a morpholino backbone structure. Forexample, a nucleic acid can comprise a 6-membered morpholino ring inplace of a ribose ring. In some of these embodiments, aphosphorodiamidate or other non-phosphodiester internucleoside linkagecan replace a phosphodiester linkage.

A nucleic acid can comprise linked morpholino units (i.e. morpholinonucleic acid) having heterocyclic bases attached to the morpholino ring.Linking groups can link the morpholino monomeric units in a morpholinonucleic acid. Non-ionic morpholino-based oligomeric compounds can haveless undesired interactions with cellular proteins. Morpholino-basedpolynucleotides can be nonionic mimics of nucleic acids. A variety ofcompounds within the morpholino class can be joined using differentlinking groups. A further class of polynucleotide mimetic can bereferred to as cyclohexenyl nucleic acids (CeNA). The furanose ringnormally present in a nucleic acid molecule can be replaced with acyclohexenyl ring. CeNA DMT protected phosphoramidite monomers can beprepared and used for oligomeric compound synthesis usingphosphoramidite chemistry. The incorporation of CeNA monomers into anucleic acid chain can increase the stability of a DNA/RNA hybrid. CeNAoligoadenylates can form complexes with nucleic acid complements withsimilar stability to the native complexes. A further modification caninclude Locked Nucleic Acids (LNAs) in which the 2′-hydroxyl group islinked to the 4′ carbon atom of the sugar ring thereby forming a2′-C,4′-C-oxymethylene linkage thereby forming a bicyclic sugar moiety.The linkage can be a methylene (—CH2-), group bridging the 2′ oxygenatom and the 4′ carbon atom wherein n is 1 or 2. LNA and LNA analogs candisplay very high duplex thermal stabilities with complementary nucleicacid (Tm=+3 to +10° C.), stability towards 3′-exonucleolytic degradationand good solubility properties.

A nucleic acid can also include nucleobase (often referred to simply as“base”) modifications or substitutions. As used herein, “unmodified” or“natural” nucleobases can include the purine bases, (e.g. adenine (A)and guanine (G)), and the pyrimidine bases, (e.g. thymine (T), cytosine(C) and uracil (U)). Modified nucleobases can include other syntheticand natural nucleobases such as 5-methylcytosine (5-me-C),5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine,6-methyl and other alkyl derivatives of adenine and guanine, 2-propyland other alkyl derivatives of adenine and guanine, 2-thiouracil,2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl(—C═C—CH3) uracil and cytosine and other alkynyl derivatives ofpyrimidine bases, 6-azo uracil, cytosine and thymine, 5-uracil(pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl,8-hydroxyl and other 8-substituted adenines and guanines, 5-haloparticularly 5-bromo, 5-trifluoromethyl and other 5-substituted uracilsand cytosines, 7-methylguanine and 7-methyladenine, 2-F-adenine,2-aminoadenine, 8-azaguanine and 8-azaadenine, 7-deazaguanine and7-deazaadenine and 3-deazaguanine and 3-deazaadenine. Modifiednucleobases can include tricyclic pyrimidines such as phenoxazinecytidine (1H-pyrimido(5,4-b)(1,4)benzoxazin-2(3H)-one), phenothiazinecytidine (1H-pyrimido(5,4-b)(1,4)benzothiazin-2(3H)-one), G-clamps suchas a substituted phenoxazine cytidine (e.g.9-(2-aminoethoxy)-H-pyrimido(5,4-(b) (1,4)benzoxazin-2(3H)-one),carbazole cytidine (2H-pyrimido(4,5-b)indol-2-one), pyridoindolecytidine (Hpyrido(3′,′:4,5)pyrrolo[2,3-d]pyrimidin-2-one).

As used herein, the term “quasi-symmetric stochastically barcodednucleic acid” can refer to a molecule comprising a stochastic barcode ofthe disclosure and ends that are symmetric enough to hybridize togetherto form a panhandle structure (e.g., for suppression PCR), but may notbe identical. A quasi-symmetric stochastically barcoded nucleic acid canbehave like a symmetric nucleic acid, but have an asymmetric sequence.

As used herein, the term “sample” can refer to a composition comprisingtargets. Suitable samples for analysis by the disclosed methods,devices, and systems include cells, single cells, tissues, organs, ororganisms.

As used herein, the term “sampling device” or “device” can refer to adevice which may take a section of a sample and/or place the section ona substrate. A sample device can refer to, for example, a fluorescenceactivated cell sorting (FACS) machine, a cell sorter machine, a biopsyneedle, a biopsy device, a tissue sectioning device, a microfluidicdevice, a blade grid, and/or a microtome.

As used herein, the term “second universal label” can refer to a labelthat is universal for barcodes of the disclosure. A second universallabel can be a modified version of a sequencing primer binding site(e.g., a read primer binding site, i.e., for an Illumina sequencer). Asecond universal label can be a modified version of a label of thedisclosure (e.g., first universal label).

As used herein, the term “solid support” can refer to discrete solid orsemi-solid surfaces to which a plurality of stochastic barcodes may beattached. A solid support can encompass any type of solid, porous, orhollow sphere, ball, bearing, cylinder, or other similar configurationcomposed of plastic, ceramic, metal, or polymeric material (e.g.,hydrogel) onto which a nucleic acid can be immobilized (e.g., covalentlyor non-covalently). A solid support can comprise a discrete particlethat may be spherical (e.g., microspheres) or have a non-spherical orirregular shape, such as cubic, cuboid, pyramidal, cylindrical, conical,oblong, or disc-shaped, and the like. A plurality of solid supportsspaced in an array may not comprise a substrate. A solid support can beused interchangeably with the term “bead.”

A solid support can refer to a “substrate.” A substrate can be a type ofsolid support. A substrate can refer to a continuous solid or semi-solidsurface on which the methods of the disclosure may be performed. Asubstrate can refer to an array, a cartridge, a chip, a device, and aslide, for example. As used herein, “solid support” and “substrate” canbe used interchangeably.

As used herein, the term “stochastic barcode” can refer to apolynucleotide sequence comprising labels of the disclosure. Astochastic barcode can be a polynucleotide sequence that can be used forstochastic barcoding. Stochastic barcodes can be used to quantifytargets within a sample. Stochastic barcodes can be used to control forerrors which may occur after a label is associated with a target. Forexample, a stochastic barcode can be used to assess amplification orsequencing errors. A stochastic barcode associated with a target can becalled a stochastic barcode-target or stochastic barcode-tag-target.

As used herein, the term “stochastic barcoding” can refer to the randomlabeling (e.g., barcoding) of nucleic acids. Stochastic barcoding canutilize a recursive Poisson strategy to associate and quantify labelsassociated with targets. As used herein, the term “stochastic barcoding”can be used interchangeably with “stochastic labeling.”

As used here, the term “target” can refer to a composition which can beassociated with a stochastic barcode. Exemplary suitable targets foranalysis by the disclosed methods, devices, and systems includeoligonucleotides, DNA, RNA, mRNA, microRNA, tRNA, and the like. Targetscan be single or double stranded. In some embodiments targets can beproteins. In some embodiments targets are lipids.

The term “reverse transcriptase” can refer to a group of enzymes havingreverse transcriptase activity (i.e., that catalyze synthesis of DNAfrom an RNA template). In general, such enzymes include, but are notlimited to, retroviral reverse transcriptase, retrotransposon reversetranscriptase, retroplasmid reverse transcriptase, retron reversetranscriptase, bacterial reverse transcriptase, group II intron-derivedreverse transcriptase, and mutants, variants or derivatives thereof.Non-retroviral reverse transcriptase include non-LTR retrotransposonreverse transcriptase, retroplasmid reverse transcriptase, retronreverse transcriptase, and group II intron reverse transcriptase.Examples of group II intron reverse transcriptase include the Lactococcs lactis Ll.LtrB intron reverse transcriptase, the Thermosynechococcuselongatus TeI4c intron reverse transcriptase, or the Geobacillusstearothermophilus GsI-IIC intron reverse transcriptase. Other classesof reverse transcriptase can include many classes of non-retroviralreverse transcriptase (i.e., retrons, group II introns, anddiversity-generating retroelements among others).

The term “template switching” can refer to the ability of a reversetranscriptase to switch from an initial nucleic acid sequence templateto the 3′ end of a new nucleic acid sequence template having little orno complementarity to the 3′ end of the nucleic acid synthesized fromthe initial template. Nucleic acid copies of a target polynucleotide canbe made using template switching. Template switching allows, e.g., a DNAcopy to be prepared using a reverse transcriptase that switches from aninitial nucleic acid sequence template to the 3′ end of a new nucleicacid sequence template having little or no complementarity to the 3′ endof the DNA synthesized from the initial template, thereby allowing thesynthesis of a continuous product DNA that directly links an adaptorsequence to a target oligonucleotide sequence without ligation. Templateswitching can comprise ligation of adaptor, homopolymer tailing (e.g.,polyadenylation), random primer, or an oligonucleotide that thepolymerase can associate with.

Stochastic Barcodes

As used herein, the term “stochastic barcode” refers to a polynucleotidesequence that can be used to stochastically label (e.g., barcode, tag) atarget. A stochastic barcode can comprise one or more labels. Exemplarylabels can include a universal label, a cellular label, a molecularlabel, a sample label, a plate label, a spatial label, and/or apre-spatial label. FIG. 4 illustrates an exemplary stochastic barcode ofthe disclosure. A stochastic barcode 404 can comprise a 5′amine that maylink the stochastic barcode to a solid support 405. In some embodiments,the stochastic barcode can comprise a universal label, a dimensionlabel, a spatial label, a cellular label, and/or a molecular label. Insome embodiments, the universal label can be 5′-most label. In someembodiments, the molecular label can be the 3′-most label. In someembodiments, the spatial label, dimension label, and the cellular labelcan be in any order. In some embodiments, the universal label, thespatial label, the dimension label, the cellular label, and themolecular label are in any order. In some embodiments, the stochasticbarcode can comprise a target-binding region. In some embodiments, thetarget-binding region can interact with a target (e.g., target nucleicacid, RNA, mRNA, DNA) in a sample. For example, a target-binding regioncan comprise an oligo dT sequence which can interact with poly-A tailsof mRNAs. In some embodiments, the labels of the stochastic barcode(e.g., universal label, dimension label, spatial label, cellular label,and molecular label) can be separated by, or by about, 1, 2, 3, 4, 5, 6,7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or morenucleotides.

A stochastic barcode can, for example, comprise one or more universallabels. The one or more universal labels can be the same for allstochastic barcodes in the set of stochastic barcodes (e.g., attached toa given solid support). In some embodiments, the one or more universallabels can be the same for all stochastic barcodes attached to aplurality of beads. In some embodiments, a universal label can comprisea nucleic acid sequence that is capable of hybridizing to a sequencingprimer. In some embodiments, sequencing primers can be used forsequencing stochastic barcodes comprising a universal label. In someembodiments, sequencing primers (e.g., universal sequencing primers) cancomprise sequencing primers associated with high-throughput sequencingplatforms. In some embodiments, the universal label can comprise anucleic acid sequence that is capable of hybridizing to a PCR primer. Insome embodiments, the universal label can comprise a nucleic acidsequence that is capable of hybridizing to a sequencing primer and a PCRprimer. In some embodiments, the nucleic acid sequence of the universallabel that is capable of hybridizing to a sequencing or PCR primer canbe referred to as a primer binding site. In some embodiments, theuniversal label can comprise a sequence that can be used to initiatetranscription of the stochastic barcode. In some embodiments, theuniversal label can comprise a sequence that may be used for extensionof the stochastic barcode or a region within the stochastic barcode. Insome embodiments, the universal label can be, or can be at least about,1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 or more nucleotides inlength. In some embodiments, the universal label can comprise at leastabout 10 nucleotides. In some embodiments, the universal label can be,or can be at most about, 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45,50 or more nucleotides in length. In some embodiments, a cleavablelinker or modified nucleotide can be part of the universal labelsequence to enable the stochastic barcode to be cleaved off from thesupport. As used herein, a universal label can be used interchangeablywith “universal PCR primer.”

In some embodiments, the stochastic barcode can comprise one or moredimension labels. A dimension label can comprise a nucleic acid sequencethat provides information about a dimension in which the stochasticlabeling occurred. For example, the dimension label can provideinformation about the time at which a target was stochasticallybarcoded. In some embodiments, the dimension label can be associatedwith a time of stochastic barcoding in a sample. In some embodiments,the dimension label can activated at the time of stochastic labeling. Insome embodiments, different dimension labels can be activated atdifferent times. In some embodiments, the dimension label providesinformation about the order in which targets, groups of targets, and/orsamples were stochastically barcoded. For example, a population of cellscan be stochastically barcoded at the G0 phase of the cell cycle. Insome embodiments, the cells can be pulsed again with stochastic barcodesat the G1 phase of the cell cycle. The cells can be pulsed again withstochastic barcodes at the S phase of the cell cycle, and so on.Stochastic barcodes at each pulse (e.g., each phase of the cell cycle),can comprise different dimension labels. In this way, the dimensionlabel provides information about which targets were labelled at whichphase of the cell cycle. In some embodiments, dimension labels caninterrogate many different biological times. Exemplary biological timescan include, but are not limited to, the cell cycle, transcription(e.g., transcription initiation), and transcript degradation. In anotherexample, a sample (e.g., a cell, a population of cells) can bestochastically labeled before and/or after treatment with a drug and/ortherapy. The changes in the number of copies of distinct targets can beindicative of the sample's response to the drug and/or therapy.

The dimension label can be activatable. For example, an activatabledimension label can be activated at a specific timepoint. Theactivatable dimension label can be constitutively activated (e.g., notturned off). The activatable dimension label can be reversibly activated(e.g., the activatable dimension label can be turned on and turned off).The dimension label can be reversibly activatable at least 1, 2, 3, 4,5, 6, 7, 8, 9, or 10 or more times. The dimension label can bereversibly activatable at least 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or moretimes. The dimension label can be activated with fluorescence, light, achemical event (e.g., cleavage, ligation of another molecule, additionof modifications (e.g., pegylated, sumoylated, acetylated, methylated,deacetylated, demethylated), a photochemical event (e.g., photocaging),and introduction of a non-natural nucleotide.

The dimension label can be identical for all stochastic barcodesattached to a given solid support (e.g., bead), but different fordifferent solid supports (e.g., beads). In some embodiments, at least60%, 70%, 80%, 85%, 90%, 95%, 97%, 99% or 100% of stochastic barcodes onthe same solid support can comprise the same dimension label. In someembodiments, at least 60% of stochastic barcodes on the same solidsupport can comprise the same dimension label. In some embodiments, atleast 95% of stochastic barcodes on the same solid support can comprisethe same dimension label.

There can be 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, or more uniquedimension label sequences represented in a plurality of solid supports(e.g., beads). A dimension label can be at least, or at least about, 1,2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 or more nucleotides inlength. In some embodiments, the dimension label can be, or be at mostabout, 300, 200, 100, 90, 80, 70, 60, 50, 40, 30, 20, 15, 12, 10, 9, 8,7, 6, 5, 4 or fewer or more nucleotides in length. In some embodiments,the dimension label comprises from about 5 to about 200 nucleotides. Insome embodiments, the dimension label comprises from about 10 to about150 nucleotides. In some embodiments, the dimension label comprises fromabout 20 to about 125 nucleotides in length.

A stochastic barcode can comprise one or more spatial labels. A spatiallabel can comprise a nucleic acid sequence that provides informationabout the spatial orientation of a target molecule which is associatedwith the stochastic barcode. In some embodiments, the spatial label canbe associated with a coordinate in a sample. In some embodiments, thecoordinate can be a fixed coordinate. For example a coordinate can befixed in reference to a substrate. In some embodiments, the spatiallabel can be in reference to a two or three-dimensional grid. In someembodiments, the coordinate can be fixed in reference to a landmark. Insome embodiments, the landmark can be identifiable in space. In someembodiments, the landmark can be a structure which can be imaged. Insome embodiments, the landmark can be a biological structure, forexample an anatomical landmark. In some embodiments, the landmark can bea cellular landmark, for instance an organelle. In some embodiments, thelandmark can be a non-natural landmark such as a structure with anidentifiable identifier such as a color code, bar code, magneticproperty, fluorescents, radioactivity, or a unique size or shape. Insome embodiments, the spatial label can be associated with a physicalpartition (e.g. a well, a container, or a droplet). In some instances,multiple spatial labels are used together to encode one or morepositions in space.

The spatial label can be identical for all stochastic barcodes attachedto a given solid support (e.g., bead), but different for different solidsupports (e.g., beads). In some embodiments, at least 60%, 70%, 80%,85%, 90%, 95%, 97%, 99% or 100% of stochastic barcodes on the same solidsupport can comprise the same spatial label. In some embodiments, atleast 60% of stochastic barcodes on the same solid support can comprisethe same spatial label. In some embodiments, at least 95% of stochasticbarcodes on the same solid support can comprise the same spatial label.

There can be 10⁴, 10⁶, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, or more unique spatiallabel sequences represented in a plurality of solid supports (e.g.,beads). A spatial label can be, or be at least about, 1, 2, 3, 4, 5, 10,15, 20, 25, 30, 35, 40, 45, 50 or more nucleotides in length. Thespatial label can be, or be at most about, 300, 200, 100, 90, 80, 70,60, 50, 40, 30, 20, 15, 12, 10, 9, 8, 7, 6, 5, 4 or fewer or morenucleotides in length. In some embodiments, the spatial label comprisesfrom about 5 to about 200 nucleotides. In some embodiments, the spatiallabel comprises from about 10 to about 150 nucleotides. In someembodiments, the spatial label comprises from about 20 to about 125nucleotides in length.

Stochastic barcodes can comprise one or more cellular label (i.e.,sample label). As used herein, the terms “sample label” and “cellularlabel” are used interchangeably. A cellular label can comprise a nucleicacid sequence that provides information for determining which targetnucleic acid originated from which cell. In some embodiments, thecellular label is identical for all stochastic barcodes attached to agiven solid support (e.g., bead), but different for different solidsupports (e.g., beads). In some embodiments, at least 60%, 70%, 80%,85%, 90%, 95%, 97%, 99% or 100% of stochastic barcodes on the same solidsupport can comprise the same cellular label. In some embodiments, atleast 60% of stochastic barcodes on the same solid support can comprisethe same cellular label. In some embodiment, at least 95% of stochasticbarcodes on the same solid support can comprise the same cellular label.

There can be 10⁴, 10⁶, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, or more unique cellularlabel sequences represented in a plurality of solid supports (e.g.,beads). The cellular label can be, or be at least about, 1, 2, 3, 4, 5,10, 15, 20, 25, 30, 35, 40, 45, 50 or more nucleotides in length. Thecellular label can be, or be at most about, 300, 200, 100, 90, 80, 70,60, 50, 40, 30, 20, 15, 12, 10, 9, 8, 7, 6, 5, 4 or fewer or morenucleotides in length. In some embodiments, the cellular label comprisesfrom about 5 to about 200 nucleotides. In some embodiments, the cellularlabel comprises from about 10 to about 150 nucleotides. In someembodiments, the cellular label comprises from about 20 to about 125nucleotides in length.

Stochastic barcodes can comprise one or more molecular labels. Amolecular label can comprise a nucleic acid sequence that providesidentifying information for the specific type of target nucleic acidspecies hybridized to the stochastic barcode. The molecular label cancomprise a nucleic acid sequence that provides a counter for thespecific occurrence of the target nucleic acid species hybridized to thestochastic barcode (e.g., target-binding region). In some embodiments, adiverse set of molecular labels are attached to a given solid support(e.g., bead). In some embodiments, there can be 10⁴, 10⁶, 10⁶, 10⁷, 10⁸,10⁹, 10¹⁰, or more unique molecular label sequences attached to a givensolid support (e.g., bead). In some embodiments, there can be as many as10⁵ or more unique molecular label sequences attached to a given solidsupport (e.g., bead). In some embodiments, there can be 10⁴, 10⁶, 10⁶,10⁷, 10⁸, 10⁹, 10¹⁰, or more unique molecular label sequences attachedto a given solid support (e.g., bead). In some embodiments, there can beas many as 10², 10³, 10⁴, 10⁶, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, or more uniquemolecular label sequences attached to a given solid support (e.g.,bead). In some embodiments, there can be as many as 10, 10², 10³, 10⁴,10⁶ or more unique molecular label sequences attached to a given solidsupport (e.g., bead). The molecular label can be, or be at least about,1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 or more nucleotides inlength. The molecular label can be, or be at most about, 300, 200, 100,90, 80, 70, 60, 50, 40, 30, 20, 15, 12, 10, 9, 8, 7, 6, 5, 4 or fewernucleotides in length.

A stochastic barcode can comprise one or more target binding regions. Insome embodiments, a target binding region can comprise a nucleic acidsequence that hybridizes specifically to a target (e.g., target nucleicacid, target molecule, e.g., a cellular nucleic acid to be analyzed),for example to a specific gene sequence. In some embodiments, the targetbinding region can comprise a nucleic acid sequence that can attach(e.g., hybridize) to a specific location of a specific target nucleicacid. In some embodiments, the target binding region can comprise anucleic acid sequence that is capable of specific hybridization to arestriction site overhang (e.g. an EcoRI sticky-end overhang). Thestochastic barcode can then ligate to any nucleic acid moleculecomprising a sequence complementary to the restriction site overhang.

A target-binding region can, for example, hybridize with a target ofinterest. For example, the target-binding region can comprise an oligodT which can hybridize with mRNAs comprising poly-adenylated ends. Atarget-binding region can be gene-specific. For example, thetarget-binding region can be configured to hybridize to a specificregion of a target. The target-binding region can be, or be at least, 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,22, 23, 24, 25, 26 27, 28, 29, or 30 or more nucleotides in length. Thetarget-binding region can be, or be at most, 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 27,28, 29, or 30 or more nucleotides in length. A target-binding region canbe from 5-30 nucleotides in length. When a stochastic barcode comprisesa gene-specific target-binding region, the stochastic barcode can bereferred to as a gene-specific stochastic barcode.

A target binding region can comprise one or more non-specific targetnucleic acid sequences. A non-specific target nucleic acid sequence canrefer to a sequence that may bind to multiple target nucleic acids,independent of the specific sequence of the target nucleic acid. Forexample, the target binding region can comprise a random multimersequence, or an oligo-dT sequence that hybridizes to the poly-A tail onmRNA molecules. A random multimer sequence can be, for example, a randomdimer, trimer, quatramer, pentamer, hexamer, septamer, octamer, nonamer,decamer, or higher multimer sequence of any length. In some embodiments,the target binding region is the same for all stochastic barcodesattached to a given bead. In some embodiments, the target bindingregions for the plurality of stochastic barcodes attached to a givenbead can comprise two or more different target binding sequences. Thetarget binding region can be, or be at least about, 5, 10, 15, 20, 25,30, 35, 40, 45, 50 or more nucleotides in length. The target bindingregion can be, or be at most about, 5, 10, 15, 20, 25, 30, 35, 40, 45,50 or more nucleotides in length.

A stochastic barcode can comprise one or more orientation propertieswhich can be used to orient (e.g., align) the stochastic barcodes. Thestochastic barcode can comprise one or more moieties for isoelectricfocusing. In some embodiments, different stochastic barcodes cancomprise different isoelectric focusing points. For example, in someembodiments, when these stochastic barcodes are introduced to a sample,the sample can undergo isoelectric focusing in order to orient thestochastic barcodes into a known way. In this way, the orientationproperty can be used to develop a known map of stochastic barcodes in asample. Exemplary orientation properties can include, electrophoreticmobility (e.g., based on size of the stochastic barcode), isoelectricpoint, spin, conductivity, and/or self-assembly. For example, stochasticbarcodes can comprise an orientation property of self-assembly, whichcan self-assemble into a specific orientation (e.g., nucleic acidnanostructure) upon activation.

A stochastic barcode can comprise one or more affinity properties. Forexample, a spatial label can comprise an affinity property. The affinityproperty can include, in some embodiments, a chemical and/or biologicalmoiety that can facilitate binding of the stochastic barcode to anotherentity (e.g., cell receptor). For example, an affinity property cancomprise an antibody. In some embodiments, the antibody can be specificfor a specific moiety (e.g., receptor) on a sample. In some embodiments,the antibody can guide the stochastic barcode to a specific cell type ormolecule. Targets at and/or near the specific cell type or molecule canbe stochastically labeled. An affinity property can also provide spatialinformation in addition to the nucleotide sequence of the spatial labelbecause the antibody can guide the stochastic barcode to a specificlocation. The antibody can be a therapeutic antibody, a monoclonalantibody, or a polyclonal antibody. The antibody can be humanized, orchimeric. The antibody can be a naked antibody or a fusion antibody.

The antibody can refer to a full-length (i.e., naturally occurring orformed by normal immunoglobulin gene fragment recombinatorial processes)immunoglobulin molecule (e.g., an IgG antibody) or an immunologicallyactive (i.e., specifically binding) portion of an immunoglobulinmolecule, like an antibody fragment.

The antibody can, in some embodiments, be an antibody fragment. Anantibody fragment can be, for example, a portion of an antibody such asF(ab′)2, Fab′, Fab, Fv, sFv and the like. The antibody fragment can, insome embodiments, bind with the same antigen that is recognized by thefull-length antibody. The antibody fragment, in some embodiments, caninclude isolated fragments consisting of the variable regions ofantibodies, such as the “Fv” fragments consisting of the variableregions of the heavy and light chains and recombinant single chainpolypeptide molecules in which light and heavy variable regions areconnected by a peptide linker (“scFv proteins”). Exemplary antibodiescan include, but are not limited to, antibodies for antibodies forcancer cells, antibodies for viruses, antibodies that bind to cellsurface receptors (CD8, CD34, CD45), and therapeutic antibodies.

The cellular label and/or any label of the disclosure can furthercomprise a unique set of nucleic acid sub-sequences of defined length,e.g. 7 nucleotides each (equivalent to the number of bits used in someHamming error correction codes), which are designed to provide errorcorrection capability. The set of error correction sub-sequencescomprise 7 nucleotide sequences can be designed such that any pairwisecombination of sequences in the set exhibits a defined “geneticdistance” (or number of mismatched bases), for example, a set of errorcorrection sub-sequences may be designed to exhibit a genetic distanceof 3 nucleotides. In some embodiments, the length of the nucleic acidsub-sequences used for creating error correction codes can vary, forexample, they can be at least 3 nucleotides, at least 7 nucleotides, atleast 15 nucleotides, or at least 31 nucleotides in length. In someembodiments, nucleic acid sub-sequences of other lengths may be used forcreating error correction codes.

Stochastic barcodes can, in some embodiments, comprise error-correctingsequences (e.g., Hamming codes) in them for error-correction. A Hammingcode can refer an arithmetic process that identifies unique binary codesbased upon inherent redundancy that are capable of correcting single biterrors. For example, a Hamming code can be matched with a nucleic acidbarcode in order to screen for single nucleotide errors occurring duringnucleic acid amplification. The identification of a single nucleotideerror by using a Hamming code, thereby can allow for the correction ofthe nucleic acid barcode.

When a stochastic barcode comprises more than one of a type of label(e.g., more than one cellular label or more than one molecular label),the labels can be interspersed with a linker label sequence. Forexample, the linker label sequence can be, or be at least about, 5, 10,15, 20, 25, 30, 35, 40, 45, 50 or more nucleotides in length. In someembodiments, the linker label sequence can be, or be at most about, 5,10, 15, 20, 25, 30, 35, 40, 45, 50 or more nucleotides in length. Insome embodiments, the linker label sequence is 12 nucleotides in length.The linker label sequence can be used, in some embodiments, tofacilitate the synthesis of the stochastic barcode. In some embodiments,the linker label can comprise an error-correcting (e.g., Hamming) code.

Quasi-Symmetric Stochastically Barcoded Nucleic Acids

The disclosure provides for compositions comprising quasi-symmetricstochastically barcoded nucleic acids. A quasi-symmetric stochasticallybarcoded nucleic acid can comprise a stochastic barcode of thedisclosure. A quasi-symmetric stochastically barcoded nucleic acid cancomprise two ends that are quasi-symmetric. A quasi-symmetricstochastically barcoded nucleic acid can comprise a nucleic acid of anytarget sequence and/or of any length. For example, a quasi-symmetricstochastically barcoded nucleic acid can be RNA (e.g., mRNA, miRNA,tRNA, lncRNA, non-coding RNA, coding RNA, and the like), DNA (e.g.,genomic DNA, intron, exon, coding region, non-coding region, and thelike), or a combination thereof. The quasi-symmetric stochasticallybarcoded nucleic acid can be, or be at least, 10, 50, 100, 150, 200,250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900,950, or 1000 or more nucleotides in length. The quasi-symmetricstochastically barcoded nucleic acid can be, or be at most, 10, 50, 100,150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800,850, 900, 950, or 1000 or more nucleotides in length.

The quasi-symmetric stochastically barcoded nucleic acid can comprisetwo ends that are quasi-symmetric. The two ends can comprise universallabels (e.g., a universal label on the stochastic barcode (herein, afirst universal label), and a universal label from the adaptor (herein asecond universal label)). The universal label of the adaptor (e.g.,second universal label) can be the same as the universal label on thestochastic barcode (e.g. on the 5′ end of the single-stranded cDNAmolecule, e.g., first universal label). The second universal label cancomprise a sequence that is a subset of the first universal label. Forexample, the second universal label can comprise at least 10, 20, 30,40, 50, 60, 70, 80, 90, 95, or 100% of the sequence of the firstuniversal label. The second universal label can comprise at most 10, 20,30, 40, 50, 60, 70, 80, 90, 95, or 100% of the sequence of the firstuniversal label. The second universal label can be, or be at least, 1,2, 3, 4, 5, 6, 7, 8, 9, or 10 or more nucleotides shorter or longer thanthe first universal label. The second universal label can be, or be atmost, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more nucleotides shorter orlonger than the first universal label. In some embodiments, the seconduniversal label is shorter than the first universal label by 2nucleotides. The universal label can differ from the first universallabel by at least 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more nucleotides.The universal label can differ from the first universal label by at most1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more nucleotides. The seconduniversal label can be at most 99, 98, 7, 96, 95, 94, 93, 92, 91 or 90%or less identical to the first universal label. The second universallabel may not be identical to said first universal label. The seconduniversal label can hybridize to the first universal label over at least50, 60, 70, 80, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% of saidfirst universal label. The second universal label can be, or be at most,99% identical to the first universal label over 50, 60, 70, 80, 90, 91,92, 93, 94, 95, 96, 97, 98, 99, or 100% of the first universal label.The sequence of the first universal label can be a sequencing primerbinding site (e.g., Illumina read 2 sequence). The sequence of thesecond universal label can be a modified sequencing primer binding site(e.g., Illumina modified read 2 sequence).

In some embodiments, the first universal label and the second universallabel are able to hybridize to each other (e.g., for use in suppressionPCR). In some embodiments, the first and second universal labels canhybridize with at least 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or moremismatches. In some embodiments, the first and second universal labelscan hybridize with at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or moremismatches. The extent of hybridization can relate to the amount ofsuppression occurring during suppression PCR. For example, sequencesthat can hybridize strongly can be suppressed more than sequences thathybridize weakly.

Solid Supports

The stochastic barcodes disclosed herein can be attached to a solidsupport (e.g., bead, substrate). As used herein, the terms “tethered”,“attached”, and “immobilized” are used interchangeably, and can refer tocovalent or non-covalent means for attaching stochastic barcodes to asolid support. Any of a variety of different solid supports can be usedas solid supports for attaching pre-synthesized stochastic barcodes orfor in situ solid-phase synthesis of stochastic barcode.

In some embodiments, a solid support is a bead (e.g., a magnetic orpolymer bead). The bead can encompass any type of solid, porous, orhollow sphere, ball, bearing, cylinder, or other similar configurationcomposed of plastic, ceramic, metal, or polymeric material onto which anucleic acid may be immobilized (e.g., covalently or non-covalently).The bead can comprise a discrete particle that may be spherical (e.g.,microspheres) or have a non-spherical or irregular shape, such as cubic,cuboid, pyramidal, cylindrical, conical, oblong, or disc-shaped, and thelike. The bead can be non-spherical in shape.

Beads disclosed herein can comprise one or more of a variety ofmaterials including, but not limited to, paramagnetic materials (e.g.magnesium, molybdenum, lithium, and tantalum), superparamagneticmaterials (e.g. ferrite (Fe₃O₄; magnetite) nanoparticles), ferromagneticmaterials (e.g. iron, nickel, cobalt, some alloys thereof, and some rareearth metal compounds), ceramic, plastic, glass, polystyrene, silica,methylstyrene, acrylic polymers, titanium, latex, sepharose, agarose,hydrogel, polymer, cellulose, nylon, and any combination thereof.

The diameter of the beads can vary, for example be, or be at leastabout, 5 μm, 10 μm, 20 μm, 25 μm, 30 μm, 35 μm, 40 μm, 45 μm or 50 μm.The diameter of the beads can be at most about 5 μm, 10 μm, 20 μm, 25μm, 30 μm, 35 μm, 40 μm, 45 μm or 50 μm. In some embodiments, thediameter of the bead can be related to the diameter of the wells of thesubstrate. For example, the diameter of the bead can be at least 10, 20,30, 40, 50, 60, 70, 80, 90 or 100% longer or shorter than the diameterof the well. The diameter of the bead can be at most 10, 20, 30, 40, 50,60, 70, 80, 90 or 100% longer or shorter than the diameter of the well.The diameter of the bead can be related to the diameter of a cell (e.g.,a single cell entrapped by the a well of the substrate). The diameter ofthe bead can be at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150,200, 250, or 300% or more longer or shorter than the diameter of thecell. The diameter of the bead can be at most 10, 20, 30, 40, 50, 60,70, 80, 90, 100, 150, 200, 250, or 300% or more longer or shorter thanthe diameter of the cell.

The bead can be attached to and/or embedded in a substrate. For example,the bead can be attached to and/or embedded in a gel, hydrogel, polymerand/or matrix. The spatial position of the bead within a substrate(e.g., gel, matrix, scaffold, or polymer) can be identified using thespatial label present on the stochastic barcode on the bead which canserve as a location address.

Examples of beads can include, but are not limited to, streptavidinbeads, agarose beads, magnetic beads, Dynabeads®, MACS® microbeads,antibody conjugated beads (e.g., anti-immunoglobulin microbead), proteinA conjugated beads, protein G conjugated beads, protein A/G conjugatedbeads, protein L conjugated beads, oligodT conjugated beads, silicabeads, silica-like beads, anti-biotin microbead, anti-fluorochromemicrobead, and BcMag™ Carboxy-Terminated Magnetic Beads.

In some embodiments, the bead can be associated with (e.g. impregnatedwith) quantum dots or fluorescent dyes to make it fluorescent in onefluorescence optical channel or multiple optical channels. In someembodiments, the bead can be associated with iron oxide or chromiumoxide to make it paramagnetic or ferromagnetic. In some embodiments, thebead can be identifiable. In some embodiments, the bead can be imagedusing a camera. In some embodiments, the bead can have a detectable codeassociated with the bead. For example, the bead can comprise an RFIDtag. In some embodiments, the bead comprises a detectable tag (e.g., UPCcode, electronic barcode, etched identifier). In some embodiments, thebead can change size, for example due to swelling in an organic orinorganic solution. The bead can be hydrophobic or hydrophilic. In someembodiments, the bead is biocompatible.

A solid support (e.g., bead) can be visualized. The solid support cancomprise a visualizing tag (e.g., fluorescent dye). The solid support(e.g., bead) can be, for example, etched with an identifier (e.g., anumber). In some embodiments, the identifier can be visualized throughimaging the solid supports (e.g., beads).

A solid support can be made of, or comprise, one or more insoluble,semi-soluble, and insoluble materials. The solid support may be referredto as “functionalized” when it includes a linker, a scaffold, a buildingblock, or other reactive moiety attached thereto, whereas a solidsupport may be “nonfunctionalized” when it lack such a reactive moietyattached thereto. The solid support can be employed free in solution,such as in a microtiter well format; in a flow-through format, such asin a column; or in a dipstick.

The solid support can comprise a membrane, paper, plastic, coatedsurface, flat surface, glass, slide, chip, or any combination thereof.The solid support can, for example, take the form of resins, gels,microspheres, or other geometric configurations. The solid support can,for example, comprise silica chips, microparticles, nanoparticles,plates, arrays, capillaries, flat supports such as glass fiber filters,glass surfaces, metal surfaces (steel, gold silver, aluminum, siliconand copper), glass supports, plastic supports, silicon supports, chips,filters, membranes, microwell plates, slides, plastic materialsincluding multiwell plates or membranes (e.g., formed of polyethylene,polypropylene, polyamide, polyvinylidenedifluoride), and/or wafers,combs, pins or needles (e.g., arrays of pins suitable for combinatorialsynthesis or analysis) or beads in an array of pits or nanoliter wellsof flat surfaces such as wafers (e.g., silicon wafers), wafers with pitswith or without filter bottoms.

The solid support can, for example, comprise a polymer matrix (e.g.,gel, hydrogel). In some embodiments, the polymer matrix is able topermeate intracellular space (e.g., around organelles). In someembodiments, the polymer matrix is able to be pumped throughout thecirculatory system.

The solid support can be a biological molecule. For example, the solidsupport can be a nucleic acid, a protein, an antibody, a histone, acellular compartment, a lipid, a carbohydrate, and the like. In someembodiments, solid supports that are biological molecules can beamplified, translated, transcribed, degraded, and/or modified (e.g.,pegylated, sumoylated, acetylated, methylated). A solid support that isa biological molecule can, for example, provide spatial and timeinformation in addition to the spatial label that is attached to thebiological molecule. For example, a biological molecule can comprise afirst confirmation when unmodified, but can change to a secondconfirmation when modified. The different conformations can exposestochastic barcodes of the disclosure to targets. For example, abiological molecule can comprise stochastic barcodes that areunaccessible due to folding of the biological molecule. Uponmodification of the biological molecule (e.g., acetylation), thebiological molecule can change conformation to expose the stochasticlabels. The timing of the modification can provide another timedimension to the method of stochastic barcoding of the disclosure.

In some embodiments, the biological molecule comprising stochasticbarcodes of the disclosure can be located in the cytoplasm of a cell.Upon activation, the biological molecule can move to the nucleus,whereupon stochastic barcoding can take place. In this way, modificationof the biological molecule can encode additional space-time informationfor the targets identified by the stochastic barcodes.

The dimension label can provide information about space-time of abiological event (e.g., cell division). For example, a dimension labelcan be added to a first cell, the first cell can divide generating asecond daughter cell, the second daughter cell can comprise all, some ornone of the dimension labels. The dimension labels can be activated inthe original cell and the daughter cell. In this way, the dimensionlabel can provide information about time of stochastic barcoded indistinct spaces.

Substrates

A substrate can refer to a type of solid support. A substrate can referto a solid support that can comprise stochastic barcodes of thedisclosure. A substrate can comprise a plurality of microwells. Amicrowell can comprise a small reaction chamber of defined volume. Amicrowell can entrap one or more cells. In some embodiments, a microwellcan entrap only one cell. In some embodiments, a microwell can entrapone or more solid supports. In some embodiments, a microwell can entraponly one solid support. In some embodiments, a microwell entraps asingle cell and a single solid support (e.g., bead). In someembodiments, a microwell is sized that it can only entrap a single cell.In some embodiments, a microwell is sized that it can only entrap asingle solid support (e.g., a bead). In some embodiments, a microwell issized that it can only entrap a single cell and a single solid support(e.g. a bead).

The microwells of a microwell array can be fabricated in a variety ofshapes and sizes. Well geometries can include, but are not limited to,cylindrical, conical, hemispherical, rectangular, or polyhedral (e.g.,three dimensional geometries comprised of several planar faces, forexample, hexagonal columns, octagonal columns, inverted triangularpyramids, inverted square pyramids, inverted pentagonal pyramids,inverted hexagonal pyramids, or inverted truncated pyramids). Themicrowells can comprise a shape that combines two or more of thesegeometries. For example, a microwell may be partly cylindrical, with theremainder having the shape of an inverted cone. A microwell can includetwo side-by-side cylinders, one of larger diameter (e.g. thatcorresponds roughly to the diameter of the beads) than the other (e.g.that corresponds roughly to the diameter of the cells), that areconnected by a vertical channel (that is, parallel to the cylinder axes)that extends the full length (depth) of the cylinders. The opening ofthe microwell can be at the upper surface of the substrate. The openingof the microwell can be at the lower surface of the substrate. Theclosed end (or bottom) of the microwell can be flat. The closed end (orbottom) of the microwell can have a curved surface (e.g., convex orconcave). The shape and/or size of the microwell can be determined basedon the types of cells or solid supports to be trapped within themicrowells.

The portion of the substrate between the wells can have a topology. Forexample, the portion of the substrate between the wells can be rounded.The portion of the substrate between the wells can be pointed. Thespacing portion of the substrate between the wells can be flat. Theportion of the substrate between the wells may not be flat. In someinstances, the portion of the substrate between wells is rounded. Inother words, the portion of the substrate that does not comprise a wellcan have a curved surface. The curved surface can be fabricated suchthat the highest point (e.g., apex) of the curved surface can be at thefurthest point between the edges of two or more wells (e.g., equidistantfrom the wells). The curved surface can be fabricated such that thestart of the curved surface is at the edge of a first microwell andcreates a parabola that ends at the end of a second microwell. Thisparabola can be extended in 2 dimensions to capture microwells nearby onthe hexagonal grid of wells. The curved surface can be fabricated suchthat the surface between the wells is higher and/or curved than theplane of the opening of the well. The height of the curved surface canbe, or be at least, 0.1, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6,6.5, or 7 or more micrometers. The height of the curved surface can be,or be at most, 0.1, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5,or 7 or more micrometers.

Microwell dimensions can be, for example, characterized in terms of thediameter and depth of the well. As used herein, the diameter of themicrowell refers to the largest circle that can be inscribed within theplanar cross-section of the microwell geometry. The diameter of themicrowells can range from about 1-fold to about 10-fold the diameter ofthe cells or solid supports to be trapped within the microwells. Themicrowell diameter can be at least 1-fold, at least 1.5-fold, at least2-fold, at least 3-fold, at least 4-fold, at least 5-fold, or at least10-fold the diameter of the cells or solid supports to be trapped withinthe microwells. The microwell diameter can be at most 10-fold, at most5-fold, at most 4-fold, at most 3-fold, at most 2-fold, at most1.5-fold, or at most 1-fold the diameter of the cells or solid supportsto be trapped within the microwells. The microwell diameter can be about2.5-fold the diameter of the cells or solid supports to be trappedwithin the microwells.

The diameter of the microwells can be, for example, specified in termsof absolute dimensions. For example, the diameter of the microwells canrange from about 5 to about 60 micrometers. The microwell diameter canbe at least 5 micrometers, at least 10 micrometers, at least 15micrometers, at least 20 micrometers, at least 25 micrometers, at least30 micrometers, at least 35 micrometers, at least 40 micrometers, atleast 45 micrometers, at least 50 micrometers, or at least 60micrometers. The microwell diameter can be at most 60 micrometers, atmost 50 micrometers, at most 45 micrometers, at most 40 micrometers, atmost 35 micrometers, at most 30 micrometers, at most 25 micrometers, atmost 20 micrometers, at most 15 micrometers, at most 10 micrometers, orat most 5 micrometers. The microwell diameter can be about 30micrometers.

The microwell depth can be chosen to provide efficient trapping of cellsand solid supports. The microwell depth can be chosen to provideefficient exchange of assay buffers and other reagents contained withinthe wells. The ratio of diameter to height (i.e., aspect ratio) can bechosen such that once a cell and solid support settle inside amicrowell, they will not be displaced by fluid motion above themicrowell. The dimensions of the microwell can be chosen such that themicrowell has sufficient space to accommodate a solid support and a cellof various sizes without being dislodged by fluid motion above themicrowell. The depth of the microwells can range from about 1-fold toabout 10-fold the diameter of the cells or solid supports to be trappedwithin the microwells. The microwell depth can be at least 1-fold, atleast 1.5-fold, at least 2-fold, at least 3-fold, at least 4-fold, atleast 5-fold, or at least 10-fold the diameter of the cells or solidsupports to be trapped within the microwells. The microwell depth can beat most 10-fold, at most 5-fold, at most 4-fold, at most 3-fold, at most2-fold, at most 1.5-fold, or at most 1-fold the diameter of the cells orsolid supports to be trapped within the microwells. The microwell depthcan be about 2.5-fold the diameter of the cells or solid supports to betrapped within the microwells.

The depth of the microwells can be, for example, specified in terms ofabsolute dimensions. For example, the depth of the microwells can rangefrom about 10 to about 60 micrometers. The microwell depth can be atleast 10 micrometers, at least 20 micrometers, at least 25 micrometers,at least 30 micrometers, at least 35 micrometers, at least 40micrometers, at least 50 micrometers, or at least 60 micrometers. Themicrowell depth can be at most 60 micrometers, at most 50 micrometers,at most 40 micrometers, at most 35 micrometers, at most 30 micrometers,at most 25 micrometers, at most 20 micrometers, or at most 10micrometers. The microwell depth can be about 30 micrometers.

The volume of the microwells can vary, for example ranging from about200 picometers³ to about 120,000 picometers³. The microwell volume canbe at least 200 picometers³, at least 500 picometers³, at least 1,000picometers³, at least 10,000 picometers³, at least 25,000 picometers³,at least 50,000 picometers³, at least 100,000 picometers³, or at least120,000 picometers³. The microwell volume can be at most 120,000picometers³, at most 100,000 picometers³, at most 50,000 picometers³, atmost 25,000 picometers³, at most 10,000 picometers³, at most 1,000picometers³, at most 500 picometers³, or at most 200 picometers³. Themicrowell volume can be about 25,000 picometers³. The microwell volumecan fall within any range bounded by any of these values (e.g. fromabout 18,000 picometers³ to about 30,000 picometers³).

The volume of the microwell can also vary, for example, be at least 5,10, 15, 20, 25, 30, 35 40, 45 or 50 or more nanoliters³. The volume ofthe microwell can be at most 5, 10, 15, 20, 25, 30, 35 40, 45 or 50 ormore nanoliters³. The volume of liquid that can fit in the microwell canbe at least 5, 10, 15, 20, 25, 30, 35 40, 45 or 50 or more nanoliters³.The volume of liquid that can fit in the microwell can be at most 5, 10,15, 20, 25, 30, 35 40, 45 or 50 or more nanoliters³. The volume of themicrowell can be at least 5, 10, 15, 20, 25, 30, 35 40, 45 or 50 or morepicoliters³. The volume of the microwell can be at most 5, 10, 15, 20,25, 30, 35 40, 45 or 50 or more picoliters³. The volume of liquid thatcan fit in the microwell can be at least 5, 10, 15, 20, 25, 30, 35 40,45 or 50 or more picoliters³. The volume of liquid that can fit in themicrowell can be at most 5, 10, 15, 20, 25, 30, 35 40, 45 or 50 or morepicoliters³.

The volumes of the microwells can be further characterized in terms ofthe variation in volume from one microwell to another. The coefficientof variation (expressed as a percentage) for microwell volume can rangefrom about 1% to about 10%. The coefficient of variation for microwellvolume can be at least 1%, at least 2%, at least 3%, at least 4%, atleast 5%, at least 6%, at least 7%, at least 8%, at least 9%, or atleast 10%. The coefficient of variation for microwell volume can be atmost 10%, at most 9%, at most 8%, at most 7%, at most 6%, at most 5%, atmost 4%, at most 3%, at most 2%, or at most 1%. The coefficient ofvariation for microwell volume can have any value within a rangeencompassed by these values, for example between about 1.5% and about6.5%. In some embodiments, the coefficient of variation of microwellvolume can be about 2.5%.

The ratio of the volume of the microwells to the surface area of thebeads (or to the surface area of a solid support to which stochasticbarcode oligonucleotides may be attached) used in the methods, devices,and systems of the present disclosure can range, for example, from about2.5 to about 1,520 micrometers. The ratio can be at least 2.5, at least5, at least 10, at least 100, at least 500, at least 750, at least1,000, or at least 1,520. The ratio can be at most 1,520, at most 1,000,at most 750, at most 500, at most 100, at most 10, at most 5, or at most2.5. The ratio can be about 67.5. The ratio of microwell volume to thesurface area of the bead (or solid support used for immobilization) canfall within any range bounded by any of these values (e.g. from about 30to about 120).

The wells of the microwell array can be arranged in a one dimensional,two dimensional, or three-dimensional array. A three dimensional arraycan be achieved, for example, by stacking a series of two or more twodimensional arrays (that is, by stacking two or more substratescomprising microwell arrays).

The pattern and spacing between microwells can be chosen to optimize theefficiency of trapping a single cell and single solid support (e.g.,bead) in each well, as well as to maximize the number of wells per unitarea of the array. The microwells can be distributed according to avariety of random or non-random patterns. For example, they can bedistributed entirely randomly across the surface of the array substrate,or they can be arranged in a square grid, rectangular grid, hexagonalgrid, or the like. In some instances, the microwells are arrangedhexagonally. The center-to-center distance (or spacing) between wellscan vary from about 5 micrometers to about 75 micrometers. In someinstances, the spacing between microwells is about 10 micrometers. Inother embodiments, the spacing between wells is at least 5 micrometers,at least 10 micrometers, at least 15 micrometers, at least 20micrometers, at least 25 micrometers, at least 30 micrometers, at least35 micrometers, at least 40 micrometers, at least 45 micrometers, atleast 50 micrometers, at least 55 micrometers, at least 60 micrometers,at least 65 micrometers, at least 70 micrometers, or at least 75micrometers. The microwell spacing can be at most 75 micrometers, atmost 70 micrometers, at most 65 micrometers, at most 60 micrometers, atmost 55 micrometers, at most 50 micrometers, at most 45 micrometers, atmost 40 micrometers, at most 35 micrometers, at most 30 micrometers, atmost 25 micrometers, at most 20 micrometers, at most 15 micrometers, atmost 10 micrometers, at most 5 micrometers. The microwell spacing can beabout 55 micrometers. The microwell spacing can fall within any rangebounded by any of these values (e.g. from about 18 micrometers to about72 micrometers).

The microwell array can comprise surface features between the microwellsthat are designed to help guide cells and solid supports into the wellsand/or prevent them from settling on the surfaces between wells.Examples of suitable surface features can include, but are not limitedto, domed, ridged, or peaked surface features that encircle the wells orstraddle the surface between wells.

The total number of wells in the microwell array can be determined bythe pattern and spacing of the wells and the overall dimensions of thearray. The number of microwells in the array can range from about 96 toabout 5,000,000 or more. The number of microwells in the array can be atleast 96, at least 384, at least 1,536, at least 5,000, at least 10,000,at least 25,000, at least 50,000, at least 75,000, at least 100,000, atleast 500,000, at least 1,000,000, or at least 5,000,000. The number ofmicrowells in the array can be at most 5,000,000, at most 1,000,000, atmost 75,000, at most 50,000, at most 25,000, at most 10,000, at most5,000, at most 1,536, at most 384, or at most 96 wells. The number ofmicrowells in the array can be about 96. The number of microwells can beabout 150,000. The number of microwells in the array can fall within anyrange bounded by any of these values (e.g. from about 100 to 325,000).

Microwell arrays can be fabricated using any of a number of fabricationtechniques. Examples of fabrication methods that may be used include,but are not limited to, bulk micromachining techniques such asphotolithography and wet chemical etching, plasma etching, or deepreactive ion etching; micro-molding and micro-embossing; lasermicromachining; 3D printing or other direct write fabrication processesusing curable materials; and similar techniques.

Microwell arrays can be fabricated from any of a number of substratematerials. The choice of material can depend on the choice offabrication technique, and vice versa. Examples of suitable materialscan include, but are not limited to, silicon, fused-silica, glass,polymers (e.g. agarose, gelatin, hydrogels, polydimethylsiloxane (PDMS;elastomer), polymethylmethacrylate (PMMA), polycarbonate (PC),polypropylene (PP), polyethylene (PE), high density polyethylene (HDPE),polyimide, cyclic olefin polymers (COP), cyclic olefin copolymers (COC),polyethylene terephthalate (PET), epoxy resins, thiol-ene based resins,metals or metal films (e.g. aluminum, stainless steel, copper, nickel,chromium, and titanium), and the like. In some instances, the microwellcomprises optical adhesive. In some instances, the microwell is made outof optical adhesive. In some instances, the microwell array comprisesand/or is made out of PDMS. In some instances, the microwell is made ofplastic. A hydrophilic material can be desirable for fabrication of themicrowell arrays (e.g. to enhance wettability and minimize non-specificbinding of cells and other biological material). Hydrophobic materialsthat can be treated or coated (e.g. by oxygen plasma treatment, orgrafting of a polyethylene oxide surface layer) can also be used. Theuse of porous, hydrophilic materials for the fabrication of themicrowell array may be desirable in order to facilitate capillarywicking/venting of entrapped air bubbles in the device. The microwellarray can be fabricated from a single material. The microwell array cancomprise two or more different materials that have been bonded togetheror mechanically joined.

Microwell arrays can be fabricated using substrates of any of a varietyof sizes and shapes. For example, the shape (or footprint) of thesubstrate within which microwells are fabricated may be square,rectangular, circular, or irregular in shape. The footprint of themicrowell array substrate can be similar to that of a microtiter plate.The footprint of the microwell array substrate can be similar to that ofstandard microscope slides, e.g. about 75 mm long×25 mm wide (about 3″long×1″ wide), or about 75 mm long×50 mm wide (about 3″ long×2″ wide).The thickness of the substrate within which the microwells arefabricated can range from about 0.1 mm thick to about 10 mm thick, ormore. The thickness of the microwell array substrate can be at least 0.1mm thick, at least 0.5 mm thick, at least 1 mm thick, at least 2 mmthick, at least 3 mm thick, at least 4 mm thick, at least 5 mm thick, atleast 6 mm thick, at least 7 mm thick, at least 8 mm thick, at least 9mm thick, or at least 10 mm thick. The thickness of the microwell arraysubstrate can be at most 10 mm thick, at most 9 mm thick, at most 8 mmthick, at most 7 mm thick, at most 6 mm thick, at most 5 mm thick, atmost 4 mm thick, at most 3 mm thick, at most 2 mm thick, at most 1 mmthick, at most 0.5 mm thick, or at most 0.1 mm thick. The thickness ofthe microwell array substrate can be about 1 mm thick. The thickness ofthe microwell array substrate can be any value within these ranges, forexample, the thickness of the microwell array substrate can be betweenabout 0.2 mm and about 9.5 mm. The thickness of the microwell arraysubstrate can be uniform.

A variety of surface treatments and surface modification techniques maybe used to alter the properties of microwell array surfaces. Examplescan include, but are not limited to, oxygen plasma treatments to renderhydrophobic material surfaces more hydrophilic, the use of wet or dryetching techniques to smooth (or roughen) glass and silicon surfaces,adsorption or grafting of polyethylene oxide or other polymer layers(such as pluronic), or bovine serum albumin to substrate surfaces torender them more hydrophilic and less prone to non-specific adsorptionof biomolecules and cells, the use of silane reactions to graftchemically-reactive functional groups to otherwise inert silicon andglass surfaces, etc. Photodeprotection techniques can be used toselectively activate chemically-reactive functional groups at specificlocations in the array structure, for example, the selective addition oractivation of chemically-reactive functional groups such as primaryamines or carboxyl groups on the inner walls of the microwells may beused to covalently couple oligonucleotide probes, peptides, proteins, orother biomolecules to the walls of the microwells. The choice of surfacetreatment or surface modification utilized can depend both or either onthe type of surface property that is desired and on the type of materialfrom which the microwell array is made.

The openings of microwells can be sealed, for example, during cell lysissteps to prevent cross hybridization of target nucleic acid betweenadjacent microwells. A microwell (or array of microwells) can be sealedor capped using, for example, a flexible membrane or sheet of solidmaterial (i.e. a plate or platten) that clamps against the surface ofthe microwell array substrate, or a suitable bead, where the diameter ofthe bead is larger than the diameter of the microwell.

A seal formed using a flexible membrane or sheet of solid material cancomprise, for example, inorganic nanopore membranes (e.g., aluminumoxides), dialysis membranes, glass slides, coverslips, elastomeric films(e.g. PDMS), or hydrophilic polymer films (e.g., a polymer film coatedwith a thin film of agarose that has been hydrated with lysis buffer).

Solid supports (e.g., beads) used for capping the microwells cancomprise any of the solid supports (e.g., beads) of the disclosure. Insome instances, the solid supports are cross-linked dextran beads (e.g.,Sephadex). Cross-linked dextran can range from about 10 micrometers toabout 80 micrometers. The cross-linked dextran beads used for cappingcan be from 20 micrometers to about 50 micrometers. In some embodiments,the beads can be at least about 10, 20, 30, 40, 50, 60, 70, 80 or 90%larger than the diameter of the microwells. The beads used for cappingcan be at most about 10, 20, 30, 40, 50, 60, 70, 80 or 90% larger thanthe diameter of the microwells.

The seal or cap may allow buffer to pass into and out of the microwell,while preventing macromolecules (e.g., nucleic acids) from migrating outof the well. A macromolecule of at least about 1, 2, 3, 4, 5, 6, 7, 8,9, 10, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or more nucleotides may beblocked from migrating into or out of the microwell by the seal or cap.A macromolecule of at most about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13,14, 15, 16, 17, 18, 19, or 20 or more nucleotides may be blocked frommigrating into or out of the microwell by the seal or cap.

Solid supports (e.g., beads) can be distributed among a substrate. Solidsupports (e.g., beads) can be distributed among wells of the substrate,removed from the wells of the substrate, or otherwise transportedthrough a device comprising one or more microwell arrays by means ofcentrifugation or other non-magnetic means. A microwell of a substratecan be pre-loaded with a solid support. A microwell of a substrate canhold at least 1, 2, 3, 4, or 5, or more solid supports. A microwell of asubstrate can hold at most 1, 2, 3, 4, or 5 or more solid supports. Insome instances, a microwell of a substrate can hold one solid support.

Individual cells and beads can be compartmentalized using alternativesto microwells, for example, a single solid support and single cell couldbe confined within a single droplet in an emulsion (e.g. in a dropletdigital microfluidic system).

Cells could potentially be confined within porous beads that themselvescomprise the plurality of tethered stochastic barcodes. Individual cellsand solid supports can be compartmentalized in any type of container,microcontainer, reaction chamber, reaction vessel, or the like.

Single cell stochastic barcoding can be performed without the use ofmicrowells. Single cell, stochastic barcoding assays can be performedwithout the use of any physical container. For example, stochasticbarcoding without a physical container can be performed by embeddingcells and beads in close proximity to each other within a polymer layeror gel layer to create a diffusional barrier between different cell/beadpairs. In another example, stochastic barcoding without a physicalcontainer can be performed in situ, in vivo, on an intact solid tissue,on an intact cell, and/or subcellularly.

Microwell arrays can be a consumable component of the assay system.Microwell arrays may be reusable. Microwell arrays can be configured foruse as a stand-alone device for performing assays manually, or they maybe configured to comprise a fixed or removable component of aninstrument system that provides for full or partial automation of theassay procedure. In some embodiments of the disclosed methods, thebead-based libraries of stochastic barcodes can be deposited in thewells of the microwell array as part of the assay procedure. In someembodiments, the beads can be pre-loaded into the wells of the microwellarray and provided to the user as part of, for example, a kit forperforming stochastic barcoding and digital counting of nucleic acidtargets.

In some embodiments, two mated microwell arrays can be provided, onepre-loaded with beads which are held in place by a first magnet, and theother for use by the user in loading individual cells. Followingdistribution of cells into the second microwell array, the two arrayscan be placed face-to-face and the first magnet removed while a secondmagnet is used to draw the beads from the first array down into thecorresponding microwells of the second array, thereby ensuring that thebeads rest above the cells in the second microwell array and thusminimizing diffusional loss of target molecules following cell lysis,while maximizing efficient attachment of target molecules to thestochastic barcodes on the bead.

Microwell arrays of the disclosure can be pre-loaded with solid supports(e.g., beads). Each well of a microwell array can comprise a singlesolid support. At least 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100% ofthe wells in a microwell array can be pre-loaded with a single solidsupport. At most 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100% of thewells in a microwell array can be pre-loaded with a single solidsupport. The solid support can comprise stochastic barcodes of thedisclosure. Cellular labels of stochastic barcodes on different solidsupports can be different. Cellular labels of stochastic barcodes on thesame solid support can be the same.

Three Dimensional Substrates

A three-dimensional array can be any shape. A three-dimensionalsubstrate can be made of any material used in a substrate of thedisclosure. In some instances, a three-dimensional substrate comprises aDNA origami. DNA origami structures incorporate DNA as a buildingmaterial to make nanoscale shapes. The DNA origami process can involvethe folding of one or more long, “scaffold” DNA strands into aparticular shape using a plurality of rationally designed “staple DNAstrands. The sequences of the staple strands can be designed such thatthey hybridize to particular portions of the scaffold strands and, indoing so, force the scaffold strands into a particular shape. The DNAorigami can include a scaffold strand and a plurality of rationallydesigned staple strands. The scaffold strand can have any sufficientlynon-repetitive sequence.

The sequences of the staple strands can be selected such that the DNAorigami has at least one shape to which stochastic labels can beattached. In some embodiments, the DNA origami can be of any shape thathas at least one inner surface and at least one outer surface. An innersurface can be any surface area of the DNA origami that is stericallyprecluded from interacting with the surface of a sample, while an outersurface is any surface area of the DNA origami that is not stericallyprecluded from interacting with the surface of a sample. In someembodiments, the DNA origami has one or more openings (e.g., twoopenings), such that an inner surface of the DNA origami can be accessedby particles (e.g., solid supports). For example, in certain embodimentsthe DNA origami has one or more openings that allow particles smallerthan 10 micrometers, 5 micrometers, 1 micrometer, 500 nm, 400 nm, 300urn, 250 nm, 200 nm, 150 nm, 100 nm, 75 nm, 50 nm, 45 nm or 40 nm tocontact an inner surface of the DNA origami.

The DNA origami can change shape (conformation) in response to one ormore certain environmental stimuli. Thus an area of the DNA origami canbe an inner surface when the DNA origami takes on some conformations,but can be an outer surface when the device takes on otherconformations. In some embodiments, the DNA origami can respond tocertain environmental stimuli by taking on a new conformation.

In some embodiments, the staple strands of the DNA origami can beselected such that the DNA origami is substantially barrel- ortube-shaped. The staples of the DNA origami can be selected such thatthe barrel shape is closed at both ends or is open at one or both ends,thereby permitting particles to enter the interior of the barrel andaccess its inner surface. In certain embodiments, the barrel shape ofthe DNA origami can be a hexagonal tube.

In some embodiments, the staple strands of the DNA origami can beselected such that the DNA origami has a first domain and a seconddomain, wherein the first end of the first domain is attached to thefirst end of the second domain by one or more single-stranded DNAhinges, and the second end of the first domain is attached to the seconddomain of the second domain by the one or more molecular latches. Theplurality of staples can be selected such that the second end of thefirst domain becomes unattached to the second end of the second domainif all of the molecular latches are contacted by their respectiveexternal stimuli. Latches can be formed from two or more staple stands,including at least one staple strand having at least onestimulus-binding domain that is able to bind to an external stimulus,such as a nucleic acid, a lipid or a protein, and at least one otherstaple strand having at least one latch domain that binds to thestimulus binding domain. The binding of the stimulus-binding domain tothe latch domain supports the stability of a first conformation of theDNA origami.

Synthesis of Stochastic Barcodes on Solid Supports and Substrates

A stochastic barcode can be synthesized on a solid support (e.g., bead).Pre-synthesized stochastic barcodes (e.g., comprising the 5′ amine thatcan link to the solid support) can be attached to solid supports (e.g.,beads) through any of a variety of immobilization techniques involvingfunctional group pairs on the solid support and the stochastic barcode.The stochastic barcode can comprise a functional group. The solidsupport (e.g., bead) can comprise a functional group. The stochasticbarcode functional group and the solid support functional group cancomprise, for example, biotin, streptavidin, primary amine(s),carboxyl(s), hydroxyl(s), aldehyde(s), ketone(s), and any combinationthereof. A stochastic barcode can be tethered to a solid support, forexample, by coupling (e.g. using 1-Ethyl-3-(3-dimethylaminopropyl)carbodiimide) a 5′ amino group on the stochastic barcode to the carboxylgroup of the functionalized solid support. Residual non-coupledstochastic barcodes can be removed from the reaction mixture byperforming multiple rinse steps. In some embodiments, the stochasticbarcode and solid support are attached indirectly via linker molecules(e.g. short, functionalized hydrocarbon molecules or polyethylene oxidemolecules) using similar attachment chemistries. The linkers can becleavable linkers, e.g. acid-labile linkers or photo-cleavable linkers.

The stochastic barcodes can be synthesized on solid supports (e.g.,beads) using any of a number of solid-phase oligonucleotide synthesistechniques, such as phosphodiester synthesis, phosphotriester synthesis,phosphite triester synthesis, and phosphoramidite synthesis. Singlenucleotides can be coupled in step-wise fashion to the growing, tetheredstochastic barcode. A short, pre-synthesized sequence (or block) ofseveral oligonucleotides can be coupled to the growing, tetheredstochastic barcode.

Stochastic barcodes can be synthesized by interspersing step-wise orblock coupling reactions with one or more rounds of split-poolsynthesis, in which the total pool of synthesis beads is divided into anumber of individual smaller pools which are then each subjected to adifferent coupling reaction, followed by recombination and mixing of theindividual pools to randomize the growing stochastic barcode sequenceacross the total pool of beads. Split-pool synthesis is an example of acombinatorial synthesis process in which a maximum number of chemicalcompounds are synthesized using a minimum number of chemical couplingsteps. The potential diversity of the compound library thus created isdetermined by the number of unique building blocks (e.g. nucleotides)available for each coupling step, and the number of coupling steps usedto create the library. For example, a split-pool synthesis comprising 10rounds of coupling using 4 different nucleotides at each step will yield4¹⁰=1,048,576 unique nucleotide sequences. In some embodiments,split-pool synthesis can be performed using enzymatic methods such aspolymerase extension or ligation reactions rather than chemicalcoupling. For example, in each round of a split-pool polymeraseextension reaction, the 3′ ends of the stochastic barcodes tethered tobeads in a given pool can be hybridized with the 5′ends of a set ofsemi-random primers, e.g. primers having a structure of5′-(M)_(k)-(X)_(i)-(N)_(j)-3′, where (X)_(i) is a random sequence ofnucleotides that is i nucleotides long (the set of primers comprisingall possible combinations of (X)_(i)), (N)_(j) is a specific nucleotide(or series of j nucleotides), and (M)_(k) is a specific nucleotide (orseries of k nucleotides), wherein a different deoxyribonucleotidetriphosphate (dNTP) is added to each pool and incorporated into thetethered oligonucleotides by the polymerase.

The number of stochastic barcodes conjugated to or synthesized on asolid support can comprise at least 100, 1000, 10000, or 1000000 or morestochastic barcodes. The number of stochastic barcodes conjugated to orsynthesized on a solid support can comprise at most 100, 1000, 10000, or1000000 or more stochastic barcodes. The number of oligonucleotidesconjugated to or synthesized on a solid support such as a bead can be atleast 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10-fold more than the number oftarget nucleic acids in a cell. The number of oligonucleotidesconjugated to or synthesized on a solid support such as a bead can be atmost 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10-fold more than the number oftarget nucleic acids in a cell. At least 10, 20, 30, 40, 50, 60, 70, 80,90 or 100% of the stochastic barcode can be bound by a target nucleicacid. At most 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100% of thestochastic barcode can be bound by a target nucleic acid. At least 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100 or moredifferent target nucleic acids can be captured by the stochastic barcodeon the solid support. At most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40,50, 60, 70, 80, 90 or 100 or more different target nucleic acids can becaptured by the stochastic barcode on the solid support.

Samples Cells

A sample for use in the method, compositions, systems, and kits of thedisclosure can comprise one or more cells. In some embodiments, thecells are cancer cells excised from a cancerous tissue, for example,breast cancer, lung cancer, colon cancer, prostate cancer, ovariancancer, pancreatic cancer, brain cancer, melanoma and non-melanoma skincancers, and the like. In some instances, the cells are derived from acancer but collected from a bodily fluid (e.g. circulating tumor cells).Non-limiting examples of cancers can include, adenoma, adenocarcinoma,squamous cell carcinoma, basal cell carcinoma, small cell carcinoma,large cell undifferentiated carcinoma, chondrosarcoma, and fibrosarcoma.

In some embodiments, the cells are cells that have been infected withvirus and contain viral oligonucleotides. In some embodiments, the viralinfection can be caused by a virus selected from the group consisting ofdouble-stranded DNA viruses (e.g. adenoviruses, herpes viruses, poxviruses), single-stranded (+ strand or “sense”) DNA viruses (e.g.parvoviruses), double-stranded RNA viruses (e.g. reoviruses),single-stranded (+ strand or sense) RNA viruses (e.g. picornaviruses,togaviruses), single-stranded (−strand or antisense) RNA viruses (e.g.orthomyxoviruses, rhabdoviruses), single-stranded ((+ strand or sense)RNA viruses with a DNA intermediate in their life-cycle) RNA-RT viruses(e.g. retroviruses), and double-stranded DNA-RT viruses (e.g.hepadnaviruses). Exemplary viruses can include, but are not limited to,SARS, HIV, coronaviruses, Ebola, Malaria, Dengue, Hepatitis C, HepatitisB, and Influenza.

In some embodiments, the cells are bacterial cells. These can includecells from gram-positive bacterial and/or gram-negative bacteria.Examples of bacteria that may be analyzed using the disclosed methods,devices, and systems include, but are not limited to, Actinomedurae,Actinomyces israelii, Bacillus anthracis, Bacillus cereus, Clostridiumbotulinum, Clostridium difficile, Clostridium perfringens, Clostridiumtetani, Corynebacterium, Enterococcus faecalis, Listeria monocytogenes,Nocardia, Propionibacterium acnes, Staphylococcus aureus, Staphylococcusepiderm, Streptococcus mutans, Streptococcus pneumoniae and the like.Gram negative bacteria include, but are not limited to, Afipia felis,Bacteroides, Bartonella bacilliformis, Bortadella pertussis, Borreliaburgdorferi, Borrelia recurrentis, Brucella, Calymmatobacteriumgranulomatis, Campylobacter, Escherichia coli, Francisella tularensis,Gardnerella vaginalis, Haemophilius aegyptius, Haemophilius ducreyi,Haemophilius influenziae, Heliobacter pylori, Legionella pneumophila,Leptospira interrogans, Neisseria meningitidia, Porphyromonasgingivalis, Providencia sturti, Pseudomonas aeruginosa, Salmonellaenteridis, Salmonella typhi, Serratia marcescens, Shigella boydii,Streptobacillus moniliformis, Streptococcus pyogenes, Treponemapallidum, Vibrio cholerae, Yersinia enterocolitica, Yersinia pestis andthe like. Other bacteria may include Myobacterium avium, Myobacteriumleprae, Myobacterium tuberculosis, Bartonella henseiae, Chlamydiapsittaci, Chlamydia trachomatis, Coxiella burnetii, Mycoplasmapneumoniae, Rickettsia akari, Rickettsia prowazekii, Rickettsiarickettsii, Rickettsia tsutsugamushi, Rickettsia typhi, Ureaplasmaurealyticum, Diplococcus pneumoniae, Ehrlichia chafensis, Enterococcusfaecium, Meningococci and the like.

In some embodiments, the cells are cells from fungi. Non-limitingexamples of fungi that may be analyzed using the disclosed methods,devices, and systems include, but are not limited to, Aspergilli,Candidae, Candida albicans, Coccidioides immitis, Cryptococci, andcombinations thereof.

In some embodiments, the cells are cells from protozoans or otherparasites. Examples of parasites to be analyzed using the methods,devices, and systems of the present disclosure include, but are notlimited to, Balantidium coli, Cryptosporidium parvum, Cyclosporacayatanensis, Encephalitozoa, Entamoeba histolytica, Enterocytozoonbieneusi, Giardia lamblia, Leishmaniae, Plasmodii, Toxoplasma gondii,Trypanosomae, trapezoidal amoeba, worms (e.g., helminthes), particularlyparasitic worms including, but not limited to, Nematoda (roundworms,e.g., whipworms, hookworms, pinworms, ascarids, filarids and the like),Cestoda (e.g., tapeworms).

As used herein, the term “cell” can refer to one or more cells. In someembodiments, the cells are normal cells, for example, human cells indifferent stages of development, or human cells from different organs ortissue types (e.g. white blood cells, red blood cells, platelets,epithelial cells, endothelial cells, neurons, glial cells, fibroblasts,skeletal muscle cells, smooth muscle cells, gametes, or cells from theheart, lungs, brain, liver, kidney, spleen, pancreas, thymus, bladder,stomach, colon, small intestine). In some embodiments, the cells can beundifferentiated human stem cells, or human stem cells that have beeninduced to differentiate. In some embodiments, the cells can be fetalhuman cells. The fetal human cells can be obtained from a motherpregnant with the fetus. In some embodiments, the cells are rare cells.A rare cell can be, for example, a circulating tumor cell (CTC),circulating epithelial cell, circulating endothelial cell, circulatingendometrial cell, circulating stem cell, stem cell, undifferentiatedstem cell, cancer stem cell, bone marrow cell, progenitor cell, foamcell, mesenchymal cell, trophoblast, immune system cell (host or graft),cellular fragment, cellular organelle (e.g. mitochondria or nuclei),pathogen infected cell, and the like.

In some embodiments, the cells are non-human cells, for example, othertypes of mammalian cells (e.g. mouse, rat, pig, dog, cow, or horse). Insome embodiments, the cells are other types of animal or plant cells. Insome embodiments, the cells can be any prokaryotic or eukaryotic cells.

In some embodiments, a first cell sample is obtained from a person nothaving a disease or condition, and a second cell sample is obtained froma person having the disease or condition. In some embodiments, thepersons are different. In some embodiments, the persons are the same butcell samples are taken at different time points. In some embodiments,the persons are patients, and the cell samples are patient samples. Thedisease or condition can be a cancer, a bacterial infection, a viralinfection, an inflammatory disease, a neurodegenerative disease, afungal disease, a parasitic disease, a genetic disorder, or anycombination thereof.

In some embodiments, cells suitable for use in the presently disclosedmethods can range in size, for example ranging from about 2 micrometersto about 100 micrometers in diameter. In some embodiments, the cells canhave diameters of at least 2 micrometers, at least 5 micrometers, atleast 10 micrometers, at least 15 micrometers, at least 20 micrometers,at least 30 micrometers, at least 40 micrometers, at least 50micrometers, at least 60 micrometers, at least 70 micrometers, at least80 micrometers, at least 90 micrometers, or at least 100 micrometers. Insome embodiments, the cells can have diameters of at most 100micrometers, at most 90 micrometers, at most 80 micrometers, at most 70micrometers, at most 60 micrometers, at most 50 micrometers, at most 40micrometers, at most 30 micrometers, at most 20 micrometers, at most 15micrometers, at most 10 micrometers, at most 5 micrometers, or at most 2micrometers. The cells can have a diameter of any value within a range,for example from about 5 micrometers to about 85 micrometers. In someembodiments, the cells have diameters of about 10 micrometers.

In some embodiments, the cells are sorted prior to associating one ormore of the cells with a bead and/or in a microwell. For example thecells can be sorted by fluorescence-activated cell sorting ormagnetic-activated cell sorting, or e.g., by flow cytometry. The cellscan be filtered by size. In some instances a retentate contains thecells to be associated with the bead. In some instances the flow throughcontains the cells to be associated with the bead.

Methods of Whole Transcriptome Amplification

The disclosure provides methods for whole transcriptome amplification ofa sample. “Whole transcriptome amplification” as used herein can referto the amplification of all or a fraction of the transcriptome of asample, such as a single cell. Amplification can be accomplished usingvarious PCR or non-PCR based methods as disclosed herein.

The methods for whole transcriptome amplification disclosed herein canamplify a plurality of targets (including but not limited to mRNA, microRNA, siRNA, tRNA, rRNA, and any combination thereof) in a sample, suchas a single cell. In some embodiments, the methods disclosed herein forwhole transcriptome amplification can amplify all or a fraction of thetranscripts, or the species of transcripts, in a sample, such as asingle cell. “A species of transcripts” as used herein refers to all thetranscripts from a single gene, or genetic locus. In some embodiments, atranscriptome can comprise at least 100, 1,000, 10,000, 100,000,1,000,000 or more species of transcripts in a sample, such as a singlecell. In some embodiments, a transcriptome can comprise at least 100,1,000, 10,000, 100,000, 1,000,000, 10,000,000, 100,000,000 or moretranscripts in a sample, such as a single cell. In some embodiments, themethods disclosed herein for whole transcriptome amplification canproduce a WTA product comprising at least 1%, 2%, 3%, 4%, 5%, 6%, 7%,8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 30%, 40%,50%, 60%, 70%, 80%, 90%, 95%, or 100% of transcripts in a sample, suchas a single cell. In some embodiments, the methods disclosed herein forwhole transcriptome amplification can produce a WTA product comprisingat least 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%,15%, 16%, 17%, 18%, 19%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or100% of the species of transcripts in a sample, such as a single cell.

In some embodiments, for whole transcriptome amplification, one or moreuniversal primer binding sites can be added to the targets, such as mRNAmolecules, in a sample. In various embodiments, the methods disclosedherein comprise labeling a plurality of targets, such as mRNA molecules,in a sample. In some embodiments, labeling the plurality of targetscomprises adding a stochastic barcode to one or more of the plurality oftargets. The stochastic barcode can comprise one or more universallabels that comprise binding sites for amplification primers, such asPCR primers. In some embodiments, the stochastic barcodes can comprise auniversal label, a molecular label, a cellular label, a spatial label, atarget-specific region, or any combination thereof. The stochasticbarcodes can be added to the targets in a sequence dependent orpreferably, sequence independent manner.

Quasi-Symmetric Stochastic Barcodes

Methods for whole transcriptome amplification using quasi-symmetricstochastic barcodes are provided. A non-limiting embodiment of themethod is shown in FIGS. 1A and 1B. In FIGS. 1A and 1B, a target 105 cancomprise a poly-A tail. A target 105 can be an mRNA. The target 105 canbe hybridized to a stochastic barcode 110. The stochastic barcode 110can comprise a number of labels. For example, the stochastic barcode 110can comprise a target-specific region (e.g., oligo dT for binding topoly-A tails of mRNAs) 115, a molecular label 120, a cellular label 125,and a first universal label 130. The stochastic barcode 110 can bereverse transcribed 135 using a reverse transcriptase, therebygenerating a labelled-cDNA molecule. Excess stochastic barcodes 111 canbe treated at step 140 with a degradation enzyme 145. The degradationenzyme 145 can be, for example, an exonuclease. The labelled-cDNAmolecule can be tailed in step 150. Tailing can comprise, for example,homopolymer tailing. For example, a homopolymer tail 155 can be appendedto the 3′ end of the labelled-cDNA molecule, thereby generating a tailedmolecule 156. In some instances, excess stochastic barcodes 111 thatwere resistant to exonuclease can be tailed as well.

The tailed molecule 156 can be contacted with a second strand synthesisprimer 165. The second strand synthesis primer 165 can comprise a region166 that is complementary to the homopolymer tail 155. The second strandsynthesis primer 165 can comprise a restriction site 170. The secondstrand synthesis primer 165 can comprise a second universal label 131.The second universal label 131 can be shorter than the first universallabel 130. The second universal label 131 can be different than thefirst universal label 130. The second strand synthesis primer can beextended 175, thereby generating at step 180 a quasi-symmetricstochastically barcoded nucleic acid 181. In some instances, the excessstochastic barcode 111 can be tailed and extended as well.

The excess stochastic barcode 111 that is tailed and extended can, insome embodiments, form a panhandle structure 185. The panhandlestructure can be formed by hybridization of the first and seconduniversal labels 130 and 131 on each end of the tailed and extendedexcess stochastic barcode 111. The panhandle structure 185 can preventamplification of the tailed and extended excess stochastic barcode.

The quasi-symmetric stochastically barcoded nucleic acid 181 can beamplified 190. Amplification can be performed, for example, with a WTA(whole transcriptome amplification) primer 195. The WTA primer 195 canhybridize with the first and second universal labels 130 and 131. Thequasi-symmetric stochastically barcoded nucleic acid 181 can beamplified, for example, with a whole transcriptome amplification primer,thereby producing a quasi-symmetric stochastically barcoded amplicon.

The WTA amplified quasi-symmetric stochastically barcoded nucleic acidcan be prepared for a sequencing library (e.g., for generatingsequencing reads). For example, as shown in FIGS. 2A and 2B, therestriction site 205 can be cleaved at step 210, thereby generating anasymmetric stochastically barcoded nucleic acid 215. The asymmetricstochastically barcoded nucleic acid 215 can be amplified with adegenerate primer 220. The degenerate primer 220 can comprise apolynucleotide sequence can comprise a third universal label 225. Thethird universal label 225 can be different from the first and seconduniversal labels 130 and 131. The degenerate primer 220 can comprise arandom multimer sequence. The random multimer sequence can hybridizerandomly to the sense and/or anti-sense strand of the asymmetricstochastically barcoded nucleic acid. For example, two products can bemade. When the degenerate primer amplifies off the sense strand it cangenerate a 5′ product 230. The 5′ product 230 can, for example, comprisethe second universal sequence 131 if the restriction digest 210 is notperformed to completion. In this way, the second universal label 131 canbe a second mechanism for breaking the symmetry of the quasi-symmetricstochastically barcoded nucleic acid 181. When the degenerate primeramplifies off the anti-sense strand it can generate a 3′ product 235.The 3′ product 235 can comprise the sequence of the stochastic barcode.

The 3′ and 5′ products 230 and 235 can be used for downstream sequencinglibrary preparation. For example, the 3′ and 5′ products 230 and 235 canbe amplified with sequencing library amplification primers 240/245. Oneof the sequencing library amplification primers 240 can hybridize to thethird universal label 225. One of the sequencing library amplificationprimers 245 can hybridize to the first universal label 130. Thesequencing library amplification primers 245 may not be able tohybridize to the second universal label 131. The sequencing libraryamplification primers 245 may hybridize less efficiently to the seconduniversal label 131. Amplification of the 5′ product 230 may not occur.Amplification of the 5′ product 230 may occur less efficiently. In thisway, the second universal label 131 can be a mechanism for breaking thesymmetry of the quasi-symmetric stochastically barcoded nucleic acid181. The 3′ product can be the majority product subjected to asequencing reaction. The results of the sequencing reaction can be moreuseful and efficient because only the 3′ product is sequenced, therebygenerating the data comprising the stochastic barcode. The sequencingdata can be used in downstream methods of the disclosure such asbinning, counting, and estimating the number of original target mRNAs ina single cell.

Cell Lysis

Following the distribution of cells and stochastic barcodes, the cellscan be lysed to liberate the target molecules. Cell lysis can beaccomplished by any of a variety of means, for example, by chemical orbiochemical means, by osmotic shock, or by means of thermal lysis,mechanical lysis, or optical lysis. Cells can be lysed by addition of acell lysis buffer comprising a detergent (e.g. SDS, Li dodecyl sulfate,Triton X-100, Tween-20, or NP-40), an organic solvent (e.g. methanol oracetone), or digestive enzymes (e.g. proteinase K, pepsin, or trypsin),or any combination thereof. To increase the association of a target anda stochastic barcode, the rate of the diffusion of the target moleculescan be altered by for example, reducing the temperature and/orincreasing the viscosity of the lysate.

Association of Stochastic Barcodes to Targets

Following lysis of the cells and release of nucleic acid moleculestherefrom, the nucleic acid molecules can randomly associate with thestochastic barcodes of the co-localized solid support. Association cancomprise hybridization of a stochastic barcode's target recognitionregion to a complementary portion of the target nucleic acid molecule(e.g., oligo dT of the stochastic barcode can interact with a poly-Atail of a target). The assay conditions used for hybridization (e.g.buffer pH, ionic strength, temperature, etc.) can be chosen to promoteformation of specific, stable hybrids.

In some embodiments, the methods disclosed herein can comprise placingthe stochastic barcodes in close proximity with the sample, lysing thesample, associating distinct targets with the stochastic barcodes,amplifying the targets and/or digitally counting the targets. The methodcan, in some embodiments, further comprise analyzing and/or visualizingthe information obtained from the spatial labels on the stochasticbarcodes. The stochastic barcodes can be associated with the targetsusing a variety of methods, such as primer-based extension ortranscription, ligation, transposome-based ligation, or any combinationthereof. In some embodiments, a sample can comprise a total amount oftargets that is, is about, is less than, 1 pg, 2 pg, 3 pg, 4 pg, 5 pg, 6pg, 7 pg, 8 pg, 9 pg, 10 pg, 20 pg, 30 pg, 40 pg, 50 pg, 60 pg, 70 pg,80 pg, 90 pg, 100 pg, 200 pg, 300 pg, 400 pg, 500 pg, 600 pg, 700 pg,800 pg, 900 pg, 1 ng, or a range between any two of the above values.

Attachment can further comprise ligation of a stochastic barcode'starget recognition region and a portion of the target nucleic acidmolecule. For example, the target binding region can comprise a nucleicacid sequence that can be capable of specific hybridization to arestriction site overhang (e.g. an EcoRI sticky-end overhang). The assayprocedure can further comprise treating the target nucleic acids with arestriction enzyme (e.g. EcoRI) to create a restriction site overhang.The stochastic barcode can then be ligated to any nucleic acid moleculecomprising a sequence complementary to the restriction site overhang. Aligase (e.g., T4 DNA ligase) may be used to join the two fragments.

The labeled targets from a plurality of cells (or a plurality ofsamples) (e.g., target-barcode molecules) can be subsequently pooled,for example by retrieving the stochastic barcodes and/or the beads towhich the target-barcode molecules are attached. The retrieval of solidsupport-based collections of attached target-barcode molecules can beimplemented by use of magnetic beads and an externally-applied magneticfield. Once the target-barcode molecules have been pooled, all furtherprocessing may proceed in a single reaction vessel. Further processingcan include, for example, reverse transcription reactions, amplificationreactions, cleavage reactions, dissociation reactions, and/or nucleicacid extension reactions. Further processing reactions may be performedwithin a single microwell containing a reaction mixture and/or a samplesuch as a single cell, that is, without first pooling the labeled targetnucleic acid molecules from a plurality of cells.

Excess stochastic barcodes can be degraded with an enzyme. The enzymecan be a nuclease such as, for example, an exonuclease or anendonuclease. Exemplary nucleases can include DNase I, Cas9 nuclease,Endonuclease III, endonuclease IV, endonuclease III, endonuclease IV,endonuclease V, endonuclease VIII, exonuclease I, exonuclease III,exonuclease I, exonuclease V, micrococcal nuclease, T7 endonuclease, T7exonuclease, uracil glycosylase inhibitor, and uracil DNA glycosylase.In some embodiments, at least 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100%of the excess barcodes can be degraded with an enzyme (e.g.,exonuclease). In some embodiments, at most 10, 20, 30, 40, 50, 60, 70,80, 90 or 100% of the excess barcodes can be degraded with an enzyme(e.g., exonuclease). In some embodiments, at least 10, 20, 30, 40, 50,60, 70, 80, 90 or 100% of the excess barcodes may escape degradationwith an enzyme (e.g., exonuclease). In some embodiments, at most 10, 20,30, 40, 50, 60, 70, 80, 90 or 100% of the excess barcodes may escapedegradation with an enzyme (e.g., exonuclease).

A sample comprising, for example, a cell, organ, or tissue thin section,can be contacted with stochastic barcodes. In some embodiments, thestochastic barcodes can be immobilized on a solid support. The solidsupports can be free floating. The solid supports can be embedded in asemi-solid or solid array. The stochastic barcodes may not be associatedwith solid supports. The stochastic barcodes can be individualnucleotides. The stochastic barcodes can be associated with a substrate.When stochastic barcodes are in close proximity to targets, the targetscan hybridize to the stochastic barcode. The stochastic barcodes can becontacted at a non-depletable ratio such that each distinct target canassociate with a distinct stochastic barcode of the disclosure. Toensure efficient association between the target and the stochasticbarcode, the targets can be crosslinked to the stochastic barcode.

The probability that two distinct targets of a sample can contact thesame unique stochastic barcode can be at least 10⁻⁶′ 10⁻⁵, 10⁻⁴, 10⁻³,10⁻², or 10⁻¹ or more. The probability that two distinct targets of asample can contact the same unique stochastic barcode can be at most10⁻⁶′ 10⁻⁵, 10⁻⁴, 10⁻³, 10⁻², or 10⁻¹ or more. The probability that twotargets of the same gene from the same cell can contact the samestochastic barcode can be at least 10⁻⁶, 10⁻⁵, 10⁻⁴, 10⁻³, 10⁻², or 10⁻¹or more. The probability that two targets of the same gene from the samecell can contact the same stochastic barcode can be at most 10⁻⁶, 10⁻⁵,10⁻⁴, 10⁻³, 10⁻², or 10⁻¹ or more.

In some embodiments, cells from a population of cells can be separated(e.g., isolated) into wells of a substrate of the disclosure. Thepopulation of cells can be diluted prior to separating. For example, thepopulation of cells can be diluted such that at least 1, 5, 10, 15, 20,25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or 100% ofwells of the substrate receive a single cell. In some embodiments, thepopulation of cells can be diluted such that at most 1, 5, 10, 15, 20,25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or 100% ofwells of the substrate receive a single cell. In some embodiments, thepopulation of cells can be diluted such that the number of cells in thediluted population is at least 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50,55, 60, 65, 70, 75, 80, 85, 90, 95 or 100% of the number of wells on thesubstrate. In some embodiments, the population of cells can be dilutedsuch that the number of cells in the diluted population is at least 1,5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90,95 or 100% of the number of wells on the substrate. In some embodiments,the population of cells is diluted such that the number of cell is about10% of the number of wells in the substrate.

Distribution of single cells into wells of the substrate can follow aPoisson distribution. For example, there can be, or can be at least, a0.1, 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10% or more probability that awell of the substrate has more than one cell. In some embodiments, therecan be at least a 0.1, 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10% or moreprobability that a well of the substrate has more than one cell.Distribution of single cells into wells of the substrate can be random.Distribution of single cells into wells of the substrate can benon-random. The cells can be separated such that a well of the substratereceives only one cell.

Stochastic Barcodes on Solid Supports

The methods of whole transcriptome amplification disclosed herein can,in some embodiments, comprise steps of removing non-stochasticallylabeled targets or non-targets, etc. In some embodiments, the stochasticbarcoding (e.g., labeling, indexing) step of the methods of thedisclosure can be performed on a solid support. In some embodiments, themethods disclosed herein can comprise a step of removingnon-stochastically labeled targets, non-targets such as DNA, or othercellular components or reagents. In some embodiments, the removing cancomprise washing the solid support after contacting the solid support,such as beads, with the targets from a sample. In some embodiments, theremoving can comprise washing the solid support after extension of thestochastic barcodes on the solid support. In some embodiments, thewashing can comprising collected the solid supports, such as beads, fromthe reaction mixture.

As shown in FIGS. 3A, 3B, and 3C, the stochastic barcode 310 can beattached (e.g., conjugated, covalently attached, non-covalentlyattached) to a solid support 312. A reverse transcription reaction 335can be performed on the solid support 312. Excess stochastic barcodes310 attached to the solid support 312 can be removed (e.g., by washing,by magnets). The reverse transcription reaction 335 can produce a firststrand labelled cDNA 340. The first strand labelled cDNA 340 can becontacted with a second strand synthesis primer 345. The second strandsynthesis primer 345 can comprise a random multimer sequence 350. Thesecond strand synthesis primer 345 can comprise a second universal label355. The second strand synthesis primer 345 can be extended (e.g., byprimer extension), thereby generating a quasi-symmetric double-strandedlabelled cDNA 360. The quasi-symmetric double-stranded labelled cDNA 360can be amplified, for example, with a whole transcriptome amplificationprimer, thereby producing a quasi-symmetric stochastically barcodedamplicon. The method can be continued as described in FIGS. 2A and 2B,wherein the quasi-symmetric stochastically barcoded amplicon can becontacted with a degenerate primer 365 and undergo random priming. Thedegenerate primer can comprise a third universal label (U2). The 3′(375) and 5′ (370) products resulting from the random priming with thedegenerate primer can be amplified with sequencing library amplificationprimers. The 3′ product 375 can be favored. The reaction can besequenced and subjected to downstream methods of the disclosure such assequencing, counting, and estimating the number of target mRNAs in asingle cell.

Reverse Transcription

In some embodiments, cDNA synthesis, such as by reverse-transcription,can be used to associate stochastic barcodes to targets, e.g., mRNAs.For example, the stochastic target-barcode conjugate can comprise thestochastic barcode and a complementary sequence of all or a portion ofthe target nucleic acid (i.e. a stochastically barcoded cDNA molecule).Reverse transcription of the associated RNA molecule can occur by theaddition of a reverse transcription primer along with the reversetranscriptase. The reverse transcription primer can be an oligo-dTprimer, a random hexanucleotide primer, or a target-specificoligonucleotide primer. Oligo-dT primers can be 12-18 nucleotides inlength and bind to the endogenous poly-A tail at the 3′ end of mammalianmRNA. Random hexanucleotide primers can bind to mRNA at a variety ofcomplementary sites. Target-specific oligonucleotide primers typicallyselectively prime the mRNA of interest.

FIG. 5 illustrates an exemplary embodiment of the stochastic barcodingmethod of the disclosure. A sample (e.g., section of a sample, thinslice, or one or more cells) can be contacted with a solid supportcomprising a stochastic barcode. Targets in the sample can be associatedwith the stochastic barcodes. The solid supports can be collected. cDNAsynthesis can be performed on the solid support. cDNA synthesis can beperformed off the solid support. cDNA synthesis can incorporate thelabel information from the labels in the stochastic barcode into the newcDNA target molecule being synthesized, thereby generating atarget-barcode molecule. The target-barcode molecules can be amplifiedusing PCR. The sequence of the targets and the labels of the stochasticbarcode on the target-barcode molecule can be determined by sequencingmethods.

After the synthesis of a first cDNA strand, various methods can be usedto synthesize a second strand to produce a double-stranded cDNA.Preferably, the second strand synthesis may not be conducted in atarget-specific manner. For example, a random primer can be used thatbinds to the first cDNA strand, and can be extended using a DNApolymerase. In some embodiments, homopolyer tailing can be used forsecond strand synthesis. In some embodiments, second strand synthesiscan be conducted in a primer-independent manner.

Second Strand Synthesis Using Homopolymer Tailing

In some embodiments, the methods disclosed herein provide forhomopolymer tailing of the single-stranded cDNA molecules obtained afterreverse transcription. Homopolymer tailing can comprise reaction with aterminal deoxynucleotide transferase (TdT) in the presence of a selecteddNTP, to form a homopolymer region at the 3′ cDNA ends. The homopolymertailing reaction can be performed on the RNA/cDNA duplex orsingle-stranded cDNA. If the RNA strand is present, the efficiency ofthe reaction can, in some embodiments, be enhanced by initial digestionwith a 5′ exonuclease, to expose the 3′ fragment ends, or by carryingout the reaction in the presence of cobalt ions. The homopolymer tailcan comprise a polyA tail, a polyT tail, a polyU tail, a polyC tail,and/or a polyG tail.

The homopolymer tailing reaction can introduce at least 1, 2, 3, 4, 5,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or morenucleotides to the cDNA. The homopolymer tailing reaction can introduceat most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, or 20 or more nucleotides to the cDNA. The enzymes, (e.g., RNaseand/or TdT) and selected dNTP are removed, for example, by cDNA fragmentprecipitation. In some instances, excess stochastic barcodes thatescaped degradation by the exonuclease enzyme can be homopolymer tailed.These can be considered reaction side products.

The single-stranded cDNA can be contacted with a second strand synthesisprimer to initiate second strand synthesis, thereby generating adouble-stranded labelled cDNA. The second strand synthesis primer cancomprise a region complementary to the homopolymer tail (e.g., it canbind to the homopolymer tail sequence). For example, if the homopolymertail is comprised of adenines, then the region complementary to thehomopolymer tail can comprise thymidines. The length of the regioncomplementary to the homopolymer tail can be the same as the homopolymertail. The length of the region complementary to the homopolymer tail canbe, or be at least, 10, 20, 30, 40, 50, 60, or 70% or more shorter thanthe homopolymer tail. The length of the region complementary to thehomopolymer tail can be, or be at most, 10, 20, 30, 40, 50, 60, or 70%or more shorter than the homopolymer tail. The length of the regioncomplementary to the homopolymer tail can be, or be at least, 10, 20,30, 40, 50, 60, or 70% or more longer than the homopolymer tail. Thelength of the region complementary to the homopolymer tail can be, or beat most, 10, 20, 30, 40, 50, 60, or 70% or more longer than thehomopolymer tail.

The second strand synthesis primer can comprise a cleavage site. Thecleavage site can be a restriction endonuclease cleavage site. Exemplaryrestriction endonuclease can comprise BamH1, EcoR1, AleI, ApaI, BglII,BsaI, KpnI, or any endonuclease cleaving enzyme.

The second strand synthesis primer can comprise a second universallabel. The universal label of the second strand synthesis primer (e.g.,second universal label) can be the same as the universal label on thestochastic barcode (e.g. on the 5′ end of the single-stranded cDNAmolecule, i.g., first universal label). The second universal label cancomprise a sequence that is a subset of the first universal label. Forexample, the second universal label can comprise at least 10, 20, 30,40, 50, 60, 70, 80, 90, 95, or 100% of the sequence of the firstuniversal label. The second universal label can comprise at most 10, 20,30, 40, 50, 60, 70, 80, 90, 95, or 100% of the sequence of the firstuniversal label. The second universal label can be, or be at least, 1,2, 3, 4, 5, 6, 7, 8, 9, or 10 or more nucleotides shorter or longer thanthe first universal label. The second universal label can be at most 1,2, 3, 4, 5, 6, 7, 8, 9, or 10 or more nucleotides shorter or longer thanthe first universal label. In some instances, the second universal labelis shorter than the first universal label by 2 nucleotides. Theuniversal label can differ from the first universal label by at least 1,2, 3, 4, 5, 6, 7, 8, 9, or 10 or more nucleotides. The universal labelcan differ from the first universal label by at most 1, 2, 3, 4, 5, 6,7, 8, 9, or 10 or more nucleotides. The second universal label can be,or be at most, 99, 98, 7, 96, 95, 94, 93, 92, 91 or 90% or lessidentical to said first universal label. The second universal label maynot be identical to said first universal label. The second universallabel can hybridize to said first universal label over at least 50, 60,70, 80, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100% of said firstuniversal label. The second universal label can be at most 99% identicalto said first universal label over 50, 60, 70, 80, 90, 91, 92, 93, 94,95, 96, 97, 98, 99, or 100% of said first universal label. The sequenceof the first universal label can be a sequencing primer binding site(e.g., Illumina read 2 sequence). The sequence of the second universallabel can be a modified sequencing primer binding site (e.g., Illuminamodified read 2 sequence).

In some instances, the first universal label and the second universallabel are able to hybridize to each other (e.g., for use in suppressionPCR). The first and second universal labels can hybridize with at least1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more mismatches. The first andsecond universal labels can hybridize with at most 1, 2, 3, 4, 5, 6, 7,8, 9, or 10 or more mismatches. The extent of hybridization can relateto the amount of suppression occurring during suppression PCR. Forexample, sequences that can hybridize strongly can be suppressed morethan sequences that hybridize weakly.

The second strand synthesis primer can be extended (e.g., using primerextension) over the length of the first cDNA strand, thereby generatinga quasi-symmetric stochastically barcoded nucleic acid. Thequasi-symmetric stochastically barcoded nucleic acid can comprise thesequence of the second strand synthesis primer, the second universallabel, and the labelled cDNA molecule (e.g., first cDNA strand, i.e.,including the sequence of the target-binding region and the stochasticbarcode). In this way the quasi-symmetric stochastically barcodednucleic acid can comprise a universal label at each end of the molecule.

In some instances, the second strand synthesis primer can anneal to thehomopolymer tail on the tailed excess stochastic barcode. The secondstrand synthesis primer can be extended to generate a synthesized secondstrand cDNA molecule that incorporates the sequence of the excessstochastic barcode. In some instances, the synthesized second strandcDNA molecule may not comprise a target (e.g., target polynucleotide).

Second Strand Synthesis with Random Priming

In some embodiments, as exemplified in FIGS. 3A, 3B, and 3C, the methodsof the disclosure may not comprise homopolymer tailing. The methods ofthe disclosure can provide for random priming. The method can comprisecontacting a target nucleic acid (e.g., target RNA, targetpolyadenylated transcript, target mRNA) with a stochastic barcode of thedisclosure. For example, the stochastic barcode can comprise an oligo dTregion which can hybridize with the poly-A tail of the target nucleicacid. The stochastic barcode can be attached to a solid support. Thestochastic barcode may not be attached to a solid support. The targetRNA can be reverse transcribed using the stochastic barcode as a primer,thereby resulting in a single-stranded labelled cDNA molecule.

Excess stochastic barcodes can be degraded with an enzyme. The enzymecan be a nuclease such as, for example, an exonuclease or anendonuclease. Exemplary nucleases can include DNase I, Cas9 nuclease,Endonuclease III, endonuclease IV, endonuclease III, endonuclease IV,endonuclease V, endonuclease VIII, exonuclease I, exonuclease III,exonuclease I, exonuclease V, micrococcal nuclease, T7 endonuclease, T7exonuclease, uracil glycosylase inhibitor, and uracil DNA glycosylase.At least 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100% of the excessbarcodes can be degraded with an enzyme (e.g., exonuclease). At most 10,20, 30, 40, 50, 60, 70, 80, 90 or 100% of the excess barcodes can bedegraded with an enzyme (e.g., exonuclease). At least 10, 20, 30, 40,50, 60, 70, 80, 90 or 100% of the excess barcodes may escape degradationwith an enzyme (e.g., exonuclease). At most 10, 20, 30, 40, 50, 60, 70,80, 90 or 100% of the excess barcodes may escape degradation with anenzyme (e.g., exonuclease). In some embodiments, when excess barcodesare attached to a solid support (e.g., bead), the beads can be removed.Removal can occur by, for example, washing, or the use of magnets. Atleast 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100% of the excessbeads/stochastic barcodes can be removed. At most 10, 20, 30, 40, 50,60, 70, 80, 90 or 100% of the excess beads/stochastic barcodes can beremoved.

In some embodiments, the single-stranded labelled cDNA molecule (i.e.,resulting from the reverse transcription reaction) can be attached tothe solid support. In some embodiments, the single-stranded labelledcDNA molecule (i.e., resulting from the reverse transcription reaction)may not be attached to the solid support.

In some embodiments, the single-stranded labelled cDNA molecule can becontacted with a second strand synthesis primer of the disclosure. Insome embodiments, the second strand synthesis primer can comprise asecond universal label (e.g., as described above).

In some embodiments, the second strand synthesis primer can comprise arandom multimer sequence. The random multimer sequence can, for example,be at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19 or 20 or more nucleotides in length. In some embodiments, therandom multimer sequence can be at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 or more nucleotides in length.In some embodiments, the random multimer sequence can hybridize to arandom location on the single-stranded labelled cDNA.

In some embodiments, the second strand synthesis primer can be extended(e.g., using primer extension) over the length of the first cDNA strand,thereby generating a quasi-symmetric stochastically barcoded nucleicacid. The quasi-symmetric stochastically barcoded nucleic acid can, forexample, comprise the sequence of the second strand synthesis primer andthe labelled cDNA molecule (e.g., first cDNA strand, i.e., including thesequence of the target-binding region and the stochastic barcode). Inthis way the quasi-symmetric stochastically barcoded nucleic acid cancomprise a universal label at each end of the molecule. In someembodiments, the quasi-symmetric stochastically barcoded nucleic acidcan be amplified, for example using polymerase chain reaction and awhole transcriptome amplification primer that bind to the first andsecond universal labels.

Second Strand Synthesis with No Priming

In some embodiments, the methods of the disclosure generate a secondstrand off a first strand (e.g., reverse transcription reaction) withoutthe use of an added primer. For example, as described in the methodsshown in FIG. 15 second strand synthesis can occur with the use of anRNase, DNA Pol I polymerase, and a ligase. The RNase can nick the mRNAstrand hybridized to the first strand generated from the reversetranscription reaction, thereby generating nicked mRNA primers. In someembodiments, the RNase can nick the mRNA strand at one or more specificlocations. In some embodiments, the RNase can nick the mRNA strand atnon-specific locations. The nicked mRNA primers can, for example, be, orbe at least, 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 or morenucleotides in length. In some embodiments, the nicked mRNA primers canbe, or be at most, 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 or morenucleotides in length.

The nicked mRNA primers can, for example, serve as a starting point forextension of the first strand. Extension can occur with a polymerase(e.g., DNA Pol I). The polymerase can degrade downstream mRNA primers(e.g., according to the Gubler Hoffman method of second strandsynthesis). In some embodiments, this method of second strand synthesiscan be gene-independent. For example, this method of second strandsynthesis may not use a gene-specific primer.

Second Strand Synthesis with a Strand Displacement Polymerase

In some instances, the methods of the disclosure generate a secondstrand off a first strand (e.g., reverse transcription reaction) withoutthe use of an added primer. For example, as described in the methodsshown in FIGS. 16A and 16B, second strand synthesis can occur with theuse of an RNase and a strand displacing polymerase. The RNase can nickthe mRNA strand hybridized to the first strand generated from thereverse transcription reaction, thereby generating nicked mRNA primers.The RNase can, for example, nick the mRNA strand at one or more specificlocations. In some embodiments, the RNase can nick the mRNA strand atnon-specific locations. The nicked mRNA primers can be, or be at least,5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 or more nucleotides in length.The nicked mRNA primers can be, or be at most, 5, 10, 15, 20, 25, 30,35, 40, 45, or 50 or more nucleotides in length.

The nicked mRNA primers can, for example, serve as a starting point forextension of the first strand. Extension can occur with a stranddisplacing polymerase (e.g., phi29, Bst DNA Polymerase, Large Fragment,Klenow (Exo−)). The polymerase can extend through a downstream mRNAprimer, thereby releasing single-stranded cDNAs, some of which comprise5′ mRNA primer sequences, of varying lengths. The lengths of the secondstrand can be proportional to the distance from the 3′ end where themRNA was nicked. In some embodiments, this method of second strandsynthesis can be gene-independent. For example, this method of secondstrand synthesis may not use a gene-specific primer.

Third Strand Synthesis

In some embodiments, as disclosed in FIGS. 16A and 16B, a third strandcan be generated from the single-stranded second strands generated fromthe strand displacing polymerase as described above. The third strandcan be generated by using a primer. The primer can comprise a universalsequence (e.g., a first universal sequence, i.e., the universal sequenceof the stochastic barcode used for labeling the mRNA). The primer can beextended, thereby generating a double-stranded cDNA (e.g., comprising asecond and third strand).

Adaptor Ligation

In some embodiments, adaptor ligation can be used to associatestochastic barcodes, e.g., universal label, molecular label, cellularlabel, spatial label, or any combination thereof, to targets. In someembodiments, adaptors can be ligated to the targets, with or without astochastic barcode. For example, adaptors can be ligated to adouble-stranded cDNA, a mRNA/cDNA hybrid, etc. Adaptors can comprise afirst universal primer sequence of the disclosure, a second universalprimer sequence of the disclosure, a stochastic barcode of thedisclosure, a restriction endonuclease cleavage site, or any combinationthereof. In some embodiments, the adaptor comprises a second universalprimer sequence of the disclosure and a restriction endonuclease bindingsite. In some embodiments, an adaptor can be ligated to one end of anucleic acid molecule, e.g., a double stranded cDNA molecule or anmRNA/cDNA hybrid molecule. In some embodiments, an adaptor can beligated to the end of a nucleic acid molecule that is not immobilized ona solid support, such as a bead. In some embodiments, both ends of anucleic acid molecule can be ligated with adaptors that are the same ordifferent. In some embodiments, the nucleic acid molecule can beblunt-ended before the adaptors are legated.

The term “adaptor” can refer to a single stranded, partiallydouble-stranded, or double-stranded oligonucleotide of at least 10, 15,20 or 25 bases that can be attached to the end of a nucleic acid.Adaptor sequences can be synthesized using for example, priming sites,the complement of a priming site, and recognition sites forendonucleases, common sequences and promoters. The adaptor can beentirely or substantially double stranded. A double stranded adaptor cancomprise two oligonucleotides that are at least partially complementary.The adaptor can be phosphorylated or unphosphorylated on one or bothstrands. The adaptor can have a double stranded section and a singlestranded overhang section that is completely or partially complementaryto an overhang (e.g., generated by a restriction enzyme, or a polymeraseenzyme). The overhang in the adaptor can be, for example, 4 to 8 bases.For example, when DNA is digested with the restriction enzyme EcoRI, theresulting double stranded fragments are flanked at either end by thesingle stranded overhang 5′-AATT-3′, an adaptor that carries a singlestranded overhang 5′-AATT-3′ can hybridize to the fragment throughcomplementarity between the overhanging regions. This “sticky end”hybridization of the adaptor to the fragment facilitates ligation of theadaptor to the fragment; however, blunt ended ligation is also possible.Blunt ends can be converted to sticky ends using, for example, theexonuclease activity of the Klenow fragment. For example when DNA isdigested with PvuII the blunt ends can be converted to a two base pairoverhang by incubating the fragments with Klenow in the presence of dTTPand dCTP. Overhangs can also be converted to blunt ends by filling in anoverhang or removing an overhang.

Adaptors of the disclosure can be designed such that once attached(e.g., ligated) to the stochastically barcoded nucleic acid, they can beinvolved in suppression and/or semi-suppression PCR amplification (e.g.,the resulting quasi-symmetric stochastically barcoded nucleic acid withthe adaptor can undergo suppression and/or semi-suppression PCR). Insome embodiments, adaptors of the disclosure can be designed to comprisea stochastic barcode such that the adaptors can be used tostochastically barcode a nucleic acid.

In some embodiments, the adaptor can have a structure that can enhancesuppression (suppressive structure of adaptor) or inhibit suppression(permissive structure of adaptor). For example, the adaptor can betuned, e.g., by changes in its sequence, to affect the level ofsuppression. The level of suppression can be related to the amount ofamplification of artifacts in the sample.

In some embodiments, the equilibrium constant associated with theformation of the suppressive and the permissive structures, and,therefore, the efficiency of suppression of particular DNA fragmentsduring PCR, can be related to, for example, differences in meltingtemperature of the suppressive and permissive structures, length of thesuppression adaptor, and primary structure of the adaptor. If thesuppressor sequence portion of the adapter is roughly equal to or longerthan the primer binding portion, then the suppressive structure can bepreferentially formed due to the higher melting temperatures of thesuppressive structure versus the permissive structure. If the suppressorportion is about one half, or less, the length of the primer bindingportion, or is absent altogether, then the amplification permissivestructure can be formed. The suppressive structure can have a meltingtemperature at least 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or10% or more than the permissive structure. The suppressive structure canhave a melting temperature at most 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%,7%, 8%, 9%, or 10% or more than the permissive structure. Thesuppressive structure can have a melting temperature at least 0.01-fold,0.05-fold, 0.1-fold, 0.5-fold, or 1-fold or more than the permissivestructure. The suppressive structure can have a melting temperature atmost 0.01-fold, 0.05-fold, 0.1-fold, 0.5-fold, or 1-fold or more thanthe permissive structure.

In addition to sequence length, the differences in melting temperaturesof the suppressive and permissive structures can be determined by therelative ratio of guanosine and cytidine residues to adenosine andthymidine residues (hereinafter this ratio is referred to as the GCcontent of the sequence) in the primer binding and suppressor sequenceportion of the adapter. For an adapter having a primer binding portionand a suppressor portion of fixed length, the higher the GC content ofthe suppressor portion, the greater the efficiency of suppression thatcan be achieved (using the same primer binding portion of the adapter).

In some embodiments, the longer the suppressor portion of the adaptor,the more efficient the suppression when used in conjunction with thesame primer binding portion of the adapter. The suppressor portion ofthe adaptor can be, or be at least, 1, 2, 4, 6, 8, 10, 12, 14, 16, 18,20, 22, 24, 26, 28, 30, 32, 34, 36, 38, or 40 or more nucleotides inlength. The suppressor portion of the adaptor can be, or be at most, 1,2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38,or 40 or more nucleotides in length.

In some embodiments, the adaptor does not comprise any sequences thatcan result in the formation of “hairpins” or other secondary structureswhich can, for example, prevent adapter ligation and/or primerextension.

In some instances, adaptors of the disclosure can be the adaptorsdescribed in FIG. 17. The adaptor can comprise a sequencing primersequence (e.g., Illumina Read 1 (IR1)) sequence. The WTA primer of thedisclosure can hybridize to a subsequence of the sequencing primersequence 1705.

An adaptor of the disclosure can comprise a suppression sequence 1710and a restriction enzyme binding site 1715. In some embodiments, theadaptor can be designed such that there is a mismatch between the RTprimer sequence and the adaptor, see “*”. For example, the number ofmismatches can be, or be at least, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 ormore. In some embodiments, the number of mismatches can be at most 1, 2,3, 4, 5, 6, 7, 8, 9, or 10 or more. In some embodiments, the number ofmismatches is 1. In some embodiments, the number of mismatches is 4. Insome embodiments, the suppression sequence can comprise at least 1, 2,3, 4, 5, 6, 7, 8, 9, or 10 or more nucleotides. In some embodiments, thesuppression sequence can comprise at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or10 or more nucleotides. In some embodiments, a low suppression adaptorcan comprise more mismatches with the RT primer sequence, therebyreducing the ability of the pan-handle structure to form, therebylimiting suppression. In some embodiments, a high suppression adaptorcan comprise fewer mismatches with the RT primer sequence, therebyincreasing the ability of the pan-handle structure to form, therebyincreasing suppression.

Adaptors can be ligated to double-stranded cDNAs of the disclosure.Ligation methods can include using T4 DNA Ligase which catalyzes theformation of a phosphodiester bond between juxtaposed 5′ phosphate and3′ hydroxyl termini in duplex DNA or RNA with blunt and sticky ends; TaqDNA Ligase which catalyzes the formation of a phosphodiester bondbetween juxtaposed 5′ phosphate and 3′ hydroxyl termini of two adjacentoligonucleotides which are hybridized to a complementary target DNA; E.coli DNA ligase which catalyzes the formation of a phosphodiester bondbetween juxtaposed 5′-phosphate and 3′-hydroxyl termini in duplex DNAcontaining cohesive ends; and T4 RNA ligase which catalyzes ligation ofa 5′ phosphoryl-terminated nucleic acid donor to a 3′hydroxyl-terminated nucleic acid acceptor through the formation of a 3′to 5′ phosphodiester bond, substrates include single-stranded RNA andDNA as well as dinucleoside pyrophosphates; or any other methods knownin the art. Different enzymes generate different overhangs and theoverhang of the adaptor can be targeted to ligate to fragments generatedby selected restriction enzymes.

In some embodiments, a double stranded adaptor is used and only onestrand of the adaptor is ligated to the double-stranded cDNA. Ligationof one strand of an adaptor can be selectively blocked. To blockligation, for example, one strand of the adaptor can be designed tointroduce a gap of one or more nucleotides between the 5′ end of thatstrand of the adaptor and the 3′ end of the target nucleic acid. Absenceof a phosphate from the 5′ end of an adaptor can block ligation of that5′ end to an available 3′OH. Ligation of an adaptor to a double-strandedlabeled cDNA of the disclosure can result in a quasi-symmetric nucleicacid (e.g., one or more strands of the nucleic acid comprises a firstand second universal label, as shown in FIGS. 15A and 15B, 1575).

In some embodiments, whole transcriptome amplification can be performedusing an adaptor ligation method as described in FIGS. 15A and 15B. Atarget 1505 can comprise a poly-A tail. A target 1505 can be an mRNA.The target 1505 can be hybridized to a stochastic barcode 1510. Thestochastic barcode 1510 can comprise a number of labels. For example, astochastic barcode 1510 can comprise a target-specific region (e.g.,oligo dT for binding to poly-A tails of mRNAs) 1515, a molecular label1520, a cellular label 1525, and a first universal label 1530. Thestochastic barcode can be reverse transcribed 1535 using a reversetranscriptase, thereby generating a labelled-cDNA molecule 1556. Excessstochastic barcodes 1511 can be treated 1540 with a degradation enzyme1545. The degradation enzyme 1545 can be an exonuclease.

The labelled-cDNA molecule 1556 can undergo second strand synthesis 1557thereby generating a double-stranded labeled cDNA molecule 1558. Secondstrand synthesis can be performed by contacting the labelled cDNAmolecule-mRNA hybrid with a nicking enzyme (e.g., RNaseH) that can nickthe mRNA hybridized to the labelled cDNA molecule 1556, therebygenerating nicked mRNA. The nicked mRNA can be used as a primer andextended using a polymerase (e.g., DNA Pol I), thereby incorporating thesequence of the first strand. The polymerase can comprise 5′-3′exonuclease activity. The polymerase can degrade the downstream mRNAnicks that serve as the primers for the second strand synthesis. Aligase can be used to ligate the extended sequences together, therebygenerating a second strand (e.g., double-stranded labeled cDNA molecule1558).

The double-stranded labeled cDNA molecule 1558 can comprise a sequence1531 that is complementary to the first universal label 1530. Thedouble-stranded labeled cDNA molecule 1558 can be contacted with anadaptor 1560. The adaptor 1560 can be double-stranded. The adaptor 1560can comprise a restriction endonuclease cleavage site 1562. The adaptor1560 can comprise a second universal primer sequence 1561 (that is thesame as 1531). The adaptor 1560 can comprise a 3′ overhang. The adaptor1560 can comprise a free 5′ phosphate (P) which can ligate to the 3′hydroxyl of the double-stranded labelled-cDNA molecule 1558. The adaptor1560 can ligate to both strands of the double-stranded labelled cDNAmolecule 1558.

The product can be amplified using one or more WTA amplification primers1565. The product can be amplified such that one strand is linearlyamplified 1570 and one strand is exponentially amplified 1575. Thelinearly amplified strand 1570 can comprise the amplifiable universalsequence at one end. The exponentially amplifiable strand 1575 cancomprise universal sequences at both ends. The exponentially amplifiablestrand 1575 can comprise different universal sequences at both ends,thereby generating a quasi-symmetric stochastically barcoded nucleicacid. The WTA amplified product (e.g., quasi-symmetric stochasticallybarcoded nucleic acid) 1575 can be subjected to downstream methods ofthe disclosure such as random priming and/or sequencing.

In some embodiments, the second strand can be synthesized using a stranddisplacement step. As shown in FIG. 16, an mRNA can be reversetranscribed into a cDNA using a stochastic barcode primer (as shown inFIGS. 15A and 15B), thereby generating a labeled cDNA 1605. The labeledcDNA 1605 can be treated with a nicking enzyme such as RNaseH that cannick the RNA 1610. The nicked RNAs 1610 can serve as primers forextension, thereby copying the sequencing of the labeled cDNA 1605. Thenicked RNAs 1610 can be extended with a strand displacing polymerase.The resulting strand displacement fragments 1620 can comprise an RNA end1621 that was the primer for the extension reaction. The method caninclude an RNA removal step to remove the RNA from the primer/stranddisplacement fragments 1620. The stochastic barcode of the stranddisplacement fragments 1620 can be the complement of the originalbarcode. The strand displacement fragments 1620 can comprise a bluntend. The strand displacement fragments 1620 can be extended with anucleic acid 1625 comprising a sequence or a complementary sequence tothe original universal sequence of the stochastic barcode, (See FIGS.15A and 15B) thereby generating a double-stranded whole transcriptomeamplification product 1630. The double-stranded whole transcriptomeamplification product 1630 can comprise a blunt end or an A-overhang.The double-stranded whole transcriptome amplification product 1630 canbe subject to downstream methods of the disclosure such as adaptorligation (See FIGS. 15A and 15B) and whole transcriptome amplification,which can generate a quasi-symmetric stochastically barcoded nucleicacid of the disclosure.

Adaptor Ligation with Solid Support

In some embodiments, whole transcriptome amplification can be performedusing adaptor ligation methods with a solid support, such as beads. Asdisclosed herein, stochastic barcodes immobilized on a solid support,such as beads, can be used to label a plurality of targets from asample, such as a single cell. In some embodiments, all the stochasticbarcodes on a bead comprise the same cellular label. In someembodiments, the stochastic barcodes on a bead comprise differentmolecular labels.

An exemplary embodiment of adaptor ligation with solid support is shownin FIGS. 31A-31D. In FIGS. 31A-31D, a target mRNA comprises a poly-Atail. The target can be hybridized to a stochastic barcode immobilizedon a bead 3125. The stochastic barcode can comprise a number of labels.For example, a stochastic barcode can comprise a target-specific region(e.g., oligo dT for binding to poly-A tails of mRNAs) 3120, a molecularlabel 3115, a cellular label 3110, and a first universal label 3105. Thestochastic barcode can be reverse transcribed 3130 using a reversetranscriptase, thereby generating a labelled-cDNA molecule 3135 that isimmobilized on the bead. Excess stochastic barcodes can be removed 3140with a degradation enzyme. The degradation enzyme can be an exonuclease.

The labelled-cDNA molecule 3135 can undergo second strand synthesis 3150thereby generating a double-stranded labeled cDNA molecule 3155. Secondstrand synthesis can be performed by contacting the labelled cDNAmolecule-mRNA hybrid with a nicking enzyme (e.g., RNaseH) that can nickthe mRNA hybridized to the labelled cDNA molecule 3135, therebygenerating nicked mRNA. The nicked mRNA can be used as a primer andextended using a polymerase (e.g., DNA Pol I), thereby incorporating thesequence of the first strand. The polymerase can comprise 5′-3′exonuclease activity. The polymerase can degrade the downstream mRNAnicks that serve as the primers for the second strand synthesis. Aligase can be used to ligate the extended sequences together, therebygenerating a second strand (e.g., double-stranded labeled cDNA molecule3155).

The double-stranded labeled cDNA molecule 3155 can be end-polished atstep 3160 and A-tailed 3165 at the free end to prepare for adaptorligation. The double-stranded labeled cDNA molecule 3155 can becontacted with an adaptor 3170. The adaptor can be single stranded,partially double-stranded, or fully double-stranded. The adaptor cancomprise a restriction site 3185, for example, an AsiSI site. Theadaptor can comprise a 5′ overhang which can comprise a second universalprimer sequence 3175. The adaptor can comprise a free 5′ phosphate (P)which can ligate to the 3′ hydroxyl of the double-stranded labelled-cDNAmolecule 3155. The adaptor can ligate to both strands of thedouble-stranded labelled cDNA molecule 3155.

Optionally, the ligation product can be used as a template for thirdstrand synthesis 3190. A CBO40 primer 3195 can be used to hybridize tothe second universal primer sequence 3175 to synthesize the thirdstrand. The third strand synthesis product can be WTA amplified usingone or more WTA amplification primers, for example, a CBO40 primer, anda DNA polymerase, for example, KAPA Fast2G. The WTA amplified productcan be subjected to downstream methods of the disclosure such as randompriming and/or sequencing.

Adaptor Ligation Using a Transposome

In some embodiments, the adaptor ligation step can be accomplished usinga transposome-based approach. Transposome-based adaptor ligation can beused to label targets in a sample in a variety of ways. For example,transposome-based adaptor ligation can be used to label targets thatcomprise stochastic barcodes, which can be immobilized on a solidsupport, on one end. In some embodiments, transposome-based adaptorligation can be used to label targets that comprise no stochasticbarcodes on either end. Therefore, transposome-based adaptor ligationcan be used to stochastically label targets on one or both ends. In someembodiments, transposome-based adaptor ligation can be used to label thetargets with a universal label, a molecular label, a cellular label, aspatial label, or any combination thereof. In some embodiments, twoadaptors can be ligated to a target on both ends, which can be the sameor different.

In some embodiments, the targets can be double-stranded cDNA moleculesproduced by reverse transcription and second strand synthesis methods asdisclosed herein. In some embodiments, no second strand synthesis isneeded before adding the adaptor loaded transposase to the targets,which can be mRNA/cDNA hybrids. In some embodiments, the transposome canrandomly fragment the targets. For example, the transposome can fragmentthe targets into fragments having a size that is, is about, is lessthan, is more than, 20 nt, 30 nt, 40 nt, 50 nt, 60 nt, 70 nt, 80 nt, 90nt, 100 nt, 200 nt, 300 nt, 400 nt, 500 nt, or a range between any twoof the above values.

In some embodiments, targets that are immobilized on a solid support canbe ligated using a transposome, so that the solid support can be used toremove fragments that are not immobilized on the solid support. In someembodiments, targets that are immobilized on a solid support can beligated using a transposome, so that the solid support can be used toremove fragments that are immobilized on the solid support.

The adaptors disclosed herein can be loaded into a transposase, which isadded to the targets, such as stochastically barcoded molecules. In someembodiments, the transposase can be the Tn5 transposase, or ahyperactive derivative thereof, as disclosed in Adey et al., GenomeBiology (2010) 11:R119, the content of which is hereby incorporated byreference in its entirety. In some embodiments, the adaptor can comprisea wild-type transposon DNA sequence, or derivative thereof.

As shown in FIGS. 32A-32E, a target mRNA can comprise a poly-A tail. Thetarget can be hybridized to a stochastic barcode immobilized on a bead3225. The stochastic barcode can comprise a number of labels. Forexample, a stochastic barcode can comprise a target-specific region(e.g., oligo dT for binding to poly-A tails of mRNAs) 3220, a molecularlabel 3215, a cellular label 3210, and a first universal label 3205. Thestochastic barcode can be reverse transcribed 3230 using a reversetranscriptase, thereby generating a labelled-cDNA molecule 3235 that isimmobilized on the bead. Excess stochastic barcodes can be removed 3240with a degradation enzyme. The degradation enzyme can be an exonuclease.

The labelled-cDNA molecule 3235 can undergo second strand synthesis 3250thereby generating a double-stranded labeled cDNA molecule 3255. Secondstrand synthesis can be performed by contacting the labelled cDNAmolecule-mRNA hybrid with a nicking enzyme (e.g., RNaseH) that can nickthe mRNA hybridized to the labelled cDNA molecule 3135, therebygenerating nicked mRNA. The nicked mRNA can be used as a primer andextended using a polymerase (e.g., DNA Pol I), thereby incorporating thesequence of the first strand. The polymerase can comprise 5′-3′exonuclease activity. The polymerase can degrade the downstream mRNAnicks that serve as the primers for the second strand synthesis. Aligase can be used to ligate the extended sequences together, therebygenerating a second strand (e.g., double-stranded labeled cDNA molecule3255).

The double-stranded labeled cDNA molecule 3255 can be treated with atransposome 3260 (e.g., Nextera or Nextera XT from Illumina) for adaptorligation. The double-stranded labeled cDNA molecule 3155 can becontacted with a transposome 3265 that is loaded with an adaptor 3270.The double-stranded labeled cDNA molecule 3155 can be contacted with asecond transposome 3275 that is loaded with a second adaptor 3280. Thetransposome can fragment 3290 the double-stranded labeled cDNA molecule3255 and ligate the double stranded adaptor. The double stranded adaptorcan comprise a universal primer binding sequence, for example, atransposase primer. In some embodiments, the transposome-based adaptorligation can be conducted on an mRNA-cDNA hybrid. Therefore, no secondstrand synthesis is needed.

The adaptor ligated product can be amplified 3295 using one or morelibrary amplification primers, for example, an ILR2 primer and atransposase primer to generate an indexed library. A secondamplification step can be used to finish creating a sequencing libraryusing full-length P5 and P7 primers. P5 and P7 primers are used ingenerating sequencing libraries for the HiSeq and MiSeq platforms byIllumina.

Adaptor Ligation by Template Switching

In some embodiments, the disclosure provides for methods of adaptoraddition to a stochastically barcoded molecule using template switching.As shown in FIG. 21, an mRNA 2120 can be contact to a stochastic barcode2110 that can be conjugated to a solid support (e.g., bead) 2105. Thestochastic barcode 2110 can comprise any label of the disclosure (e.g.,a molecular label, a cellular label, and a universal label 2111) and atarget-specific region 2115. The mRNA 2120 can be reverse transcribedinto a cDNA. An adaptor 2125 (e.g., adaptor of the disclosure) can beadded to the cDNA using template switching. Second strand synthesis canoccur on the adaptor-added cDNA, thereby resulting in a double-strandedcDNA 2130 that can undergo semi-suppressive PCR. The double-strandedcDNA 2130 can undergo semi-suppressive PCR because the adaptor cancomprise a sequence 2112 that is at least partially complementary to theuniversal label 2111. The double-stranded cDNA 2130 can be referred toas a quasi-symmetric stochastically barcoded nucleic acid.

Amplification with Whole Transcriptome Amplification Primer

The stochastically barcoded nucleic acid (e.g., comprising a homopolymertail or an adaptor, generated from random priming, or generated from thenon-priming second strand and third strand synthesis or adaptor ligationmethods of the disclosure) can be amplified using a whole transcriptomeamplification primer. In some embodiments, the whole transcriptomeamplification primer can be, or be at least, 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,28, 29, or 30 or more nucleotides in length. In some embodiments, hewhole transcriptome amplification primer can be at most 1, 2, 3, 4, 5,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 17, 18, 19, 20, 21, 22, 23, 24,25, 26, 27, 28, 29, or 30 or more nucleotides in length. In someembodiments, he whole transcriptome amplification primer can bind to atleast 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 17, 18, 19,20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 or more nucleotides of thefirst and second universal labels. In some embodiments, he wholetranscriptome amplification primer can bind to at most 1, 2, 3, 4, 5, 6,7, 8, 9, 10, 11, 12, 13, 14, 15, 16 17, 18, 19, 20, 21, 22, 23, 24, 25,26, 27, 28, 29, or 30 or more nucleotides of the first and seconduniversal labels. The whole transcriptome amplification primer may notbind to the entirety of the first and/or second universal labels. Insome instances, the whole transcriptome amplification primer binds to 18nucleotides of the first and/or second universal labels.

Amplification with a whole transcriptome amplification primer cangenerate a whole transcriptome amplicon. The whole transcriptomeamplicon can be quasi-symmetric. The whole transcriptome amplicon can bestored (e.g., at −20° C., −80° C.). The whole transcriptome ampliconscan represent an immortalized library of transcripts from a sample (e.g.single cell).

Amplification

One or more nucleic acid amplification reactions can be performed tocreate multiple copies of the stochastically barcoded nucleic acid(e.g., comprising a homopolymer tail or an adaptor, generated fromrandom priming, or generated from the non-priming second strand andthird strand synthesis or adaptor ligation methods of the disclosure).Amplification can be performed in a multiplexed manner, wherein multipletarget nucleic acid sequences are amplified simultaneously. Theamplification reaction can be used to add sequencing adaptors to thenucleic acid molecules. The amplification reactions can compriseamplifying at least a portion of a sample label, if present. Theamplification reactions can comprise amplifying at least a portion ofthe cellular and/or molecular label. The amplification reactions cancomprise amplifying at least a portion of a sample tag, a cellularlabel, a spatial label, a molecular label, a target nucleic acid, or acombination thereof. The amplification reactions can comprise amplifyingat least 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%,35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, or100% of the plurality of nucleic acids in a sample, such as a singlecell. The method can further comprise conducting one or more cDNAsynthesis reactions to produce one or more cDNA copies of target-barcodemolecules comprising a sample label, a cellular label, a spatial label,and/or a molecular label.

In some embodiments, amplification can be performed using a polymerasechain reaction (PCR). As used herein, PCR can refer to a reaction forthe in vitro amplification of specific DNA sequences by the simultaneousprimer extension of complementary strands of DNA. As used herein, PCRcan encompass derivative forms of the reaction, including but notlimited to, RT-PCR, real-time PCR, nested PCR, quantitative PCR,multiplexed PCR, digital PCR, and assembly PCR.

Amplification of the labeled nucleic acids can comprise non-PCR basedmethods. Examples of non-PCR based methods include, but are not limitedto, multiple displacement amplification (MDA), transcription-mediatedamplification (TMA), whole transcriptome amplification (WTA), wholegenome amplification (WGA), nucleic acid sequence-based amplification(NASBA), strand displacement amplification (SDA), real-time SDA, rollingcircle amplification, or circle-to-circle amplification. Othernon-PCR-based amplification methods include multiple cycles ofDNA-dependent RNA polymerase-driven RNA transcription amplification orRNA-directed DNA synthesis and transcription to amplify DNA or RNAtargets, a ligase chain reaction (LCR), and a Qβ replicase (Qβ) method,use of palindromic probes, strand displacement amplification,oligonucleotide-driven amplification using a restriction endonuclease,an amplification method in which a primer is hybridized to a nucleicacid sequence and the resulting duplex is cleaved prior to the extensionreaction and amplification, strand displacement amplification using anucleic acid polymerase lacking 5′ exonuclease activity, rolling circleamplification, and ramification extension amplification (RAM). In someembodiments, the amplification may not produce circularized transcripts.

Suppression PCR can be used for amplification methods of the disclosure.Suppression PCR can refer to the selective exclusion of molecules lessthan a certain size flanked by terminal inverted repeats, due to theirinefficient amplification when the primer(s) used for amplificationcorrespond(s) to the entire repeat or a fraction of the repeat. Thereason for this can lie in the equilibrium between productive PCR primerannealing and nonproductive self-annealing of the fragment'scomplementary ends. At a fixed size of a flanking terminal invertedrepeat, the shorter the insert, the stronger the suppression effect andvice versa. Likewise, at a fixed insert size, the longer the terminalinverted repeat, the stronger the suppression effect.

Suppression PCR can use adapters that are ligated to the end of a DNAfragment prior to PCR amplification. Upon melting and annealing,single-stranded DNA fragments having self-complementary adapters at the5′- and 3′-ends of the strand can form suppressive “tennis racquet”shaped structures that suppress amplification of the fragments duringPCR.

In some embodiments, the methods disclosed herein further compriseconducting a polymerase chain reaction on the labeled nucleic acid(e.g., labeled-RNA, labeled-DNA, labeled-cDNA) to produce astochastically labeled-amplicon. The labeled-amplicon can bedouble-stranded molecule. The double-stranded molecule can comprise adouble-stranded RNA molecule, a double-stranded DNA molecule, or a RNAmolecule hybridized to a DNA molecule. One or both of the strands of thedouble-stranded molecule can comprise a sample label, a spatial label, acellular label, and/or a molecular label. The stochasticallylabeled-amplicon can be a single-stranded molecule. The single-strandedmolecule can comprise DNA, RNA, or a combination thereof. The nucleicacids of the disclosure can comprise synthetic or altered nucleic acids.

Amplification can comprise use of one or more non-natural nucleotides.Non-natural nucleotides can comprise photolabile or triggerablenucleotides. Examples of non-natural nucleotides can include, but arenot limited to, peptide nucleic acid (PNA), morpholino and lockednucleic acid (LNA), as well as glycol nucleic acid (GNA) and threosenucleic acid (TNA). Non-natural nucleotides can be added to one or morecycles of an amplification reaction. The addition of the non-naturalnucleotides can be used to identify products as specific cycles or timepoints in the amplification reaction.

Conducting the one or more amplification reactions can comprise the useof one or more primers. The one or more primers can comprise at least 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 or more nucleotides.The one or more primers can comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, or 15 or more nucleotides. The one or more primerscan comprise less than 12-15 nucleotides. The one or more primers cananneal to at least a portion of the plurality of stochastically labeledtargets. The one or more primers can anneal to the 3′ end or 5′ end ofthe plurality of stochastically labeled targets. The one or more primerscan anneal to an internal region of the plurality of stochasticallylabeled targets. The internal region can be, or be at least about, 50,100, 150, 200, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320,330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460,470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600,650, 700, 750, 800, 850, 900 or 1000 nucleotides from the 3′ ends theplurality of stochastically labeled targets. The one or more primers cancomprise a fixed panel of primers. The one or more primers can compriseat least one or more custom primers. The one or more primers cancomprise at least one or more control primers. The one or more primerscan comprise at least one or more gene-specific primers.

The one or more primers can comprise any universal primer of thedisclosure. The universal primer can anneal to a universal primerbinding site. The one or more custom primers can anneal to a firstsample label, a second sample label, a spatial label, a cellular label,a molecular label, a target, or any combination thereof. The one or moreprimers can comprise a universal primer and a custom primer. The customprimer can be designed to amplify one or more targets. The targets cancomprise a subset of the total nucleic acids in one or more samples. Thetargets can comprise a subset of the total stochastically labeledtargets in one or more samples. The one or more primers can comprise atleast 96 or more custom primers. The one or more primers can comprise atleast 960 or more custom primers. The one or more primers can compriseat least 9600 or more custom primers. The one or more custom primers cananneal to two or more different labeled nucleic acids. The two or moredifferent labeled nucleic acids can correspond to one or more genes.

Any amplification scheme can be used in the methods of the presentdisclosure. For example, in one scheme, the first round PCR can amplifymolecules (e.g., attached to the bead) using a gene specific primer anda primer against the universal Illumina sequencing primer 1 sequence.The second round of PCR can amplify the first PCR products using anested gene specific primer flanked by Illumina sequencing primer 2sequence, and a primer against the universal Illumina sequencing primer1 sequence. The third round of PCR adds P5 and P7 and sample index toturn PCR products into an Illumina sequencing library. Sequencing using150 bp×2 sequencing can reveal the cell label and molecular index onread 1, the gene on read 2, and the sample index on index 1 read.

Amplification can be performed in one or more rounds. In some instancesthere are multiple rounds of amplification. Amplification can comprisetwo or more rounds of amplification. The first amplification can be anextension off X′ to generate the gene specific region. The secondamplification can occur when a sample nucleic hybridizes to the newlygenerated strand.

In some embodiments hybridization does not need to occur at the end of anucleic acid molecule. In some embodiments a target nucleic acid withinan intact strand of a longer nucleic acid is hybridized and amplified.For example a target within a longer section of genomic DNA or mRNA. Atarget can be more than 50 nt, more than 100 nt, or more that 1000 ntfrom an end of a polynucleotide.

Library Preparation

The disclosure provides for methods for library preparation. In someembodiments, the stochastically barcoded nucleic acid (e.g., comprisinga homopolymer tail or an adaptor, generated from random priming, orgenerated from the non-priming second strand and third strand synthesisor adaptor ligation methods of the disclosure) and/or the wholetranscriptome amplicons therefrom, or the quasi-symmetric stochasticallybarcoded nucleic acid and/or the whole transcriptome ampliconstherefrom, can be used for library preparation. In some instances, thestochastically barcoded nucleic acid (e.g., comprising a homopolymertail or an adaptor, generated from random priming, or generated from thenon-priming second strand and third strand synthesis or adaptor ligationmethods of the disclosure) and/or the whole transcriptome ampliconstherefrom, or the quasi-symmetric stochastically barcoded nucleic acidand/or the whole transcriptome amplicons (e.g., resulting from wholetranscriptome amplification of the quasi-symmetric stochasticallybarcoded nucleic acid) can comprise a restriction endonuclease cleavagesite. Cleavage of the restriction endonuclease cleavage site can occurwith a restriction endonuclease as disclosed herein. Treatment with arestriction endonuclease can result in at least 10, 20, 30, 40, 50, 60,70, 80, 90 or 100% cleavage of the restriction site. Treatment with arestriction endonuclease can result in at most 10, 20, 30, 40, 50, 60,70, 80, 90 or 100% cleavage of the restriction site. Cleavage of thequasi-symmetric stochastically barcoded nucleic acid and/or the wholetranscriptome amplicons can represent a first mechanism for breaking thesymmetry of the quasi-symmetric stochastically barcoded nucleic acid.Cleavage can result in an asymmetric stochastically barcoded nucleicacid and/or an asymmetric whole transcriptome amplicon (these terms, asused herein, can be used interchangeably). The quasi-symmetricstochastically barcoded nucleic acid and/or amplicon of thequasi-symmetric stochastically barcoded nucleic acid may not be cleaved.

The stochastically barcoded nucleic acid (e.g., comprising a homopolymertail or an adaptor, generated from random priming, or generated from thenon-priming second strand and third strand synthesis or adaptor ligationmethods of the disclosure) and/or the whole transcriptome ampliconstherefrom, or the asymmetric stochastically barcoded nucleic acid,quasi-symmetric stochastically barcoded nucleic acid and/or amplicon(e.g., uncleaved amplicon, i.e., amplicon of the quasi-symmetricstochastically barcoded nucleic acid) can be subjected to randompriming. In some embodiments, a degenerate primer comprising a randommultimer sequence and a third universal label can be contacted to thestochastically barcoded nucleic acid and/or amplicon. For example, therandom multimer sequence can be, or be at least, 1, 2, 3, 4, 5, 6, 7, 8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 or more nucleotides inlength. In some embodiments, the random multimer sequence can be at most1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20or more nucleotides in length. The random multimer sequence canhybridize to a random location on the stochastically barcoded nucleicacid and/or amplicon.

The third universal label of the primer oligonucleotide can be identicalto the first and/or second universal labels of the disclosure. The thirduniversal label of the primer oligonucleotide can be different from thefirst and/or second universal labels of the disclosure. The thirduniversal label of the primer oligonucleotide can differ from the firstand/or second universal labels by at least 1, 2, 3, 4, 5, 6, 7, 8, 9, or10 or more nucleotides. The third universal label of the primeroligonucleotide can differ from the first and/or second universal labelsby at most 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more nucleotides. Thethird universal label can comprise a sequencing primer binding site. Forexample, the first and/or second universal labels can comprise a firstsequencing primer binding site (e.g., Illumina read 2 primer). The thirduniversal label can comprise a second sequencing primer binding site(e.g., Illumina read 1 primer).

The random multimer can hybridize to either the sense and/or antisensestrand of the stochastically barcoded nucleic acid and/or amplicon. Thedegenerate primer can be extended (e.g., with primer extension), therebygenerating 3′ and 5′ read products (e.g., products that can generate 3′and 5′ reads on a sequencer). 3′ and 5′ read products can be referred toas asymmetric read products. Some of the read products can comprise thesequence of the target nucleic acid, the stochastic barcode, and thefirst universal label (e.g., the 3′ read products). Some of thepolynucleotide products can comprise the sequence of the target nucleicacid and the second universal label (and the restriction cleavage site,if cleavage was not efficient), (e.g., 5′ read products).

The read products can be amplified with sequencing library amplificationprimers. Sequencing library amplification primers can refer to primersused for addition of sequences that can be used in sequencing reactions(e.g., sequencing flow cell primers). A first primer of the sequencinglibrary amplification primers can hybridize to the third universal label(e.g., on the degenerate primer used in random priming). A second primerof the sequencing library amplification primers can hybridize to thefirst universal label. The second primer may not be able to hybridize tothe second universal label (e.g., because of the difference in sequencebetween the first universal label and the second universal label). Insome embodiments, the 5′ read product may not be amplified by the firstsequencing library amplification primer. In some embodiments, the 5′read product can be amplified by the first sequencing libraryamplification primer less efficiently. In some embodiments, the 5′ readproduct can be amplified at least 10, 20, 30, 40, 50, 60, 70, 80, 90, or100% less efficiently than the 3′ read product. In some embodiments, the5′ read product can be amplified at most 10, 20, 30, 40, 50, 60, 70, 80,90, or 100% less efficiently than the 3′ read product. In someembodiments, the 3′ read product can be amplified at least 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 15, 20, 225, 30, 35, or 40 or more fold more than the5′ read product. In some embodiments, the 3′ read product can beamplified at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 225, 30, 35, or40 or more fold more than the 5′ read product. In some embodiments, thesecond universal label can be a second mechanism for breaking thesymmetry of the quasi-symmetric stochastically barcoded nucleic acid(e.g., read products from only one side of the quasi-symmetricstochastically barcoded nucleic acid may be preferentially made).

The sequence of the read products can be determined (e.g., with asequencing reaction). Reads from the sequencing reaction canpreferentially map to the 3′ read product. In some embodiments, thenumber of reads of the 3′ read product can be, or be at least, 1, 2, 3,4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, or 40 or more fold more thanthe reads from the 5′ product. In some embodiments, the number of readsof the 3′ read product can be at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15,20, 25, 30, 35, or 40 or more fold more than the reads from the 5′product. In some embodiments, the 3′ products comprising the stochasticbarcode (e.g., 3′ end) can be used to estimate the number of distincttarget nucleic acids in the sample by counting the unique stochasticbarcodes on the products.

In some embodiments, the read products can be prepared for a sequencingreaction. For example, the read products can be fragmented for adaptorligation. Fragmentation can include fragmentation using a mechanical(Covaris focused electroacoustic, Nebulizer, sonication, vortex,) orenzymatic (e.g. Fragmentase) fragmentation. Fragments can be any length.For example, fragments can be from 1 to 3,000,000 nucleotides in length.

In some embodiments, the disclosure provides for library preparationmethods that may result in an asymmetric double-stranded cDNA (e.g., maynot result in a quasi-symmetric stochastically barcoded nucleic acid,e.g., have ends that comprise different sequences, e.g., may not undergosuppressive PCR). As shown in FIGS. 26 and 27, a nucleic acid (e.g.,RNA, mDNA, DNA) can be reverse transcribed, and copied, or duplicatedinto a double-stranded cDNA. The reverse transcription or primerextension event (depending on if the starting material is RNA or DNA,respectively), can be performed with a primer that comprises a firstsequence. The first sequence can, for example, comprise at least aportion of a first sequencing primer sequence (e.g., Illumina Read 1).In some embodiments, the first sequence can comprise at least 10, 20,30, 40, 50, 60, 70, 80, 90, or 100% of the sequencing primer sequence.In some embodiments, the first sequence can comprise at most 10, 20, 30,40, 50, 60, 70, 80, 90, or 100% of the sequencing primer sequence.

The double-stranded cDNA comprising the first sequencing can becontacted with adaptors. The adaptors can be double stranded. Theadaptors can be single-stranded. The adaptors can ligate to the 3′ endsof the double-stranded cDNA. The adaptors can ligate to the 5′ ends ofthe double-stranded cDNA. One strand of a double-stranded adaptor canligate to the 3′ end of one strand of the double-stranded cDNA.

The adaptors can comprise a second sequence. The second sequence cancomprise at least a portion of a second sequencing primer sequence(e.g., Illumina Read 2). In some embodiments, the second sequence cancomprise at least 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100% of thesequence of the flow cell sequence. In some embodiments, the secondsequence can comprise at most 10, 20, 30, 40, 50, 60, 70, 80, 90, or100% of the sequence of the flow cell sequence.

The adaptor ligated double-stranded cDNA can be amplified. Amplificationcan be used to complete the sequence of the first and second sequences(e.g., add in the full sequence that can be used in downstreamapplications, i.e., adding the full sequence for sequencing primersequences (e.g., Illumina Read 1 and 2) and flow cell adaptorhybridization.). The flow cell sequences (e.g., Illumina flow cellsequences) can be added during amplification. Amplification can be usedto increase the amount of the adaptor ligated double-stranded cDNA.Amplification can be performed with primers comprising at least aportion of the adaptor sequence. The primers can comprise additionalsequence to be added (e.g., to complete the sequence of the flow cellprimers). The PCR amplified molecules can be used in sequencing.

The strand sequenced can be determined based on the reads from thesequencing. Reads comprising the first sequence originated from a firststrand of the double-stranded adaptor ligated molecule. Reads comprisethe second sequence originated from a second strand of thedouble-stranded adaptor ligated molecule. The library preparationmethods can be used for RNA-seq or DNA-seq. The methods can be used todetermine which strand of an RNA molecule is involved in regulation(e.g., the antisense strand). The methods can be used to determine basepair resolution for footprinting assays (e.g., Chip-Seq). The methodsmay not comprise degradation of one strand to preserve directionality(e.g., removal of one strand allowing only the other strand to besequenced).

Sequencing

Determining the number of different stochastically labeled nucleic acidscan comprise determining the sequence of the labeled target, the spatiallabel, the molecular label, the sample label, and the cellular label orany product thereof (e.g. labeled-amplicons, labeled-cDNA molecules). Anamplified target can be subjected to sequencing. Determining thesequence of the stochastically labeled nucleic acid or any productthereof can comprise conducting a sequencing reaction to determine thesequence of at least a portion of a sample label, a spatial label, acellular label, a molecular label, and/or at least a portion of thestochastically labeled target, a complement thereof, a reversecomplement thereof, or any combination thereof.

Determination of the sequence of a nucleic acid (e.g. amplified nucleicacid, labeled nucleic acid, cDNA copy of a labeled nucleic acid, etc.)can be performed using variety of sequencing methods including, but notlimited to, sequencing by synthesis (SBS) sequencing by hybridization(SBH), sequencing by ligation (SBL), quantitative incrementalfluorescent nucleotide addition sequencing (QIFNAS), stepwise ligationand cleavage, fluorescence resonance energy transfer (FRET), molecularbeacons, TaqMan reporter probe digestion, pyrosequencing, fluorescent insitu sequencing (FISSEQ), FISSEQ beads, wobble sequencing, multiplexsequencing, polymerized colony (POLONY) sequencing; nanogrid rollingcircle sequencing (ROLONY), allele-specific oligo ligation assays (e.g.,oligo ligation assay (OLA), single template molecule OLA using a ligatedlinear probe and a rolling circle amplification (RCA) readout, ligatedpadlock probes, or single template molecule OLA using a ligated circularpadlock probe and a rolling circle amplification (RCA) readout), and thelike.

In some embodiments, determining the sequence of the labeled nucleicacid or any product thereof comprises paired-end sequencing, nanoporesequencing, high-throughput sequencing, shotgun sequencing,dye-terminator sequencing, multiple-primer DNA sequencing, primerwalking, Sanger dideoxy sequencing, Maxim-Gilbert sequencing,pyrosequencing, true single molecule sequencing, or any combinationthereof. Alternatively, the sequence of the labeled nucleic acid or anyproduct thereof can be determined by electron microscopy or achemical-sensitive field effect transistor (chemFET) array.

High-throughput sequencing methods, such as cyclic array sequencingusing platforms such as Roche 454, Illumina Solexa, ABI-SOLiD, IONTorrent, Complete Genomics, Pacific Bioscience, Helicos, or thePolonator platform, can also be utilized. Sequencing can comprise MiSeqsequencing. Sequencing can comprise HiSeq sequencing.

In some embodiments, the stochastically labeled targets can comprisenucleic acids representing from about 0.01% of the genes of anorganism's genome to about 100% of the genes of an organism's genome.For example, about 0.01% of the genes of an organism's genome to about100% of the genes of an organism's genome can be sequenced using atarget complimentary region comprising a plurality of multimers bycapturing the genes containing a complimentary sequence from the sample.In some embodiments, the labeled nucleic acids comprise nucleic acidsrepresenting from about 0.01% of the transcripts of an organism'stranscriptome to about 100% of the transcripts of an organism'stranscriptome. For example, about 0.501% of the transcripts of anorganism's transcriptome to about 100% of the transcripts of anorganism's transcriptome can be sequenced using a target complimentaryregion comprising a poly-T tail by capturing the mRNAs from the sample.

In some embodiments, sequencing can comprise sequencing at least about10, 20, 30, 40, 50, 60, 70, 80, 90, 100 or more nucleotides or basepairs of the labeled nucleic acid and/or stochastic barcode. In someembodiments, sequencing can comprise sequencing at most about 10, 20,30, 40, 50, 60, 70, 80, 90, 100 or more nucleotides or base pairs of thelabeled nucleic acid and/or stochastic barcode. In some embodiments,sequencing can comprise sequencing at least about 200, 300, 400, 500,600, 700, 800, 900, 1,000 or more nucleotides or base pairs of thelabeled nucleic acid and/or stochastic barcode. In some embodiments,sequencing can comprise sequencing at most about 200, 300, 400, 500,600, 700, 800, 900, 1,000 or more nucleotides or base pairs of thelabeled nucleic acid and/or stochastic barcode. In some embodiments,sequencing can comprise sequencing at least about 1,500; 2,000; 3,000;4,000; 5,000; 6,000; 7,000; 8,000; 9,000; or 10,000 or more nucleotidesor base pairs of the labeled nucleic acid and/or stochastic barcode. Insome embodiments, sequencing can comprise sequencing at most about1,500; 2,000; 3,000; 4,000; 5,000; 6,000; 7,000; 8,000; 9,000; or 10,000or more nucleotides or base pairs of the labeled nucleic acid and/orstochastic barcode.

In some embodiments, sequencing can comprise at least about 200, 300,400, 500, 600, 700, 800, 900, 1,000 or more sequencing reads per run. Insome embodiments, sequencing can comprise at most about 200, 300, 400,500, 600, 700, 800, 900, 1,000 or more sequencing reads per run. In someembodiments, sequencing comprises sequencing at least about 1,500;2,000; 3,000; 4,000; 5,000; 6,000; 7,000; 8,000; 9,000; or 10,000 ormore sequencing reads per run. In some embodiments, sequencing comprisessequencing at most about 1,500; 2,000; 3,000; 4,000; 5,000; 6,000;7,000; 8,000; 9,000; or 10,000 or more sequencing reads per run. In someembodiments, sequencing can comprise sequencing at least 10, 50, 100,150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800,850, 900, 950 or 1000 or more millions of sequencing reads per run. Insome embodiments, sequencing can comprise sequencing at most 10, 50,100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750,800, 850, 900, 950 or 1000 or more millions of sequencing reads per run.In some embodiments, sequencing can comprise sequencing at least 100,200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400,1500, 1600, 2000, 3000, 4000, or 5000 or more millions of sequencingreads in total. In some embodiments, sequencing can comprise sequencingat most 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200,1300, 1400, 1500, 1600, 2000, 3000, 4000, or 5000 or more millions ofsequencing reads in total. In some embodiments, sequencing can compriseless than or equal to about 1,600,000,000 sequencing reads per run. Insome embodiments, sequencing can comprise less than or equal to about200,000,000 reads per run.

RNA-Seq Library Generation Using Adaptor Ligation

In some instances, the disclosure provides for a method for generatingan RNA-seq library using adaptor ligation. Sequencing on next generationsequencers can use platform specific adapter sequences on both ends ofthe nucleic acid to be sequenced. An adaptor can be ligated to bothstrands of second-strand cDNA followed by PCR amplification, however,the adaptors can be prone to dimer formation, which can lead toartifacts. The directionality of the original sample may be lost (e.g.,differentiation of the 5′ and 3′ end of the sample). New methods inadaptor ligation can be useful for improving library preparation fornext generation sequencing, such as those shown in FIGS. 26 and 27.

As shown in FIG. 26, an RNA 2605 (e.g., mRNA or fragmented RNA) can becontacted with a primer 2610 comprising a first sequence (e.g., aportion of a first sequencing primer sequence, e.g., Illumina Read 1).The RNA 2605 can be reverse transcribed, thereby generating a first cDNAstrand. The first cDNA strand can be copied during second strandsynthesis, thereby generating a double-stranded cDNA 2615. Adaptors 2620can be ligated to the double-stranded DNA 2615. The adaptors 2620 cancomprise a second sequence (e.g., a portion of a second sequencingprimer sequence, e.g., Illumina Read 2). The adaptors 2620 can ligatedto the 3′, 5′ end, or both the 3′ and 5′ end of the double-stranded cDNA2615. The adaptors 2620 can be double-stranded, but only one strand ofthe adaptor can ligate to one strand of the double-stranded cDNA 2615.The first and second sequencing primer sequences can be completed by PCRamplification using primers 2625. The amplification primers 2625 cancomprise a portion of the sequencing primer sequences. The amplificationprimers 2625 can comprise a flow cell sequencing primer. The resultingmolecule 2630 can be sequenced. Reads comprising the adaptor sequence2620 can correspond to a first strand of the molecule 2630. Readscomprising the first sequence 2610 can correspond to a second strand ofthe molecule 2630. In this way, an RNA-seq library can be prepared withadaptors that preserve the directionality of the molecule.

DNA-Seq Library Generation Using Adaptor Ligation

In some embodiments, the disclosure provides for an adaptor ligationmethod for preparing DNA sequencing libraries as shown in FIG. 27. A DNA2705 can be contacted with a primer 2710 comprising a first sequence(e.g., a portion of a first primer sequence, e.g., Illumina Read 1sequence). The DNA can be extended thereby generating a double-strandedcDNA. Adaptors 2715 can be ligated to the double-stranded cDNA. Theadaptors 2715 can comprise a second sequence (e.g., a portion of asecond primer sequence, e.g., Illumina Read 2 sequence). The adaptors2715 can be ligated to the 3′, 5′ end, or both 3′ and 5′ end of thedouble-stranded cDNA. The adaptors 2715 can be double-stranded, but onlyone strand of the adaptor can ligate to one strand of thedouble-stranded cDNA. The first and second sequencing primer sequencescan be completed by PCR amplification using primers 2720. Theamplification primers 2720 can comprise a portion of the sequencingprimer sequences. The amplification primers 2720 can comprise a flowcell sequencing primer. The resulting molecule 2725 can be sequenced.Reads comprising the adaptor sequence 2715 can correspond to a firststrand of the molecule 2725. Reads comprising the first sequence 2710can correspond to a second strand of the molecule 2735. In this way, aDNA-seq library can be prepared with adaptors that preserve thedirectionality of the molecule.

Diffusion Across a Substrate

When a sample (e.g., cell) is stochastically barcoded according to themethods of the disclosure, the cell can be lysed. Lysis of a cell canresult in the diffusion of the contents of the lysis (e.g., cellcontents) away from the initial location of lysis. In other words, thelysis contents can move into a larger surface area than the surface areataken up by the cell.

Diffusion of sample lysis mixture (e.g., comprising targets) can bemodulated by various parameters including, but not limited to, viscosityof the lysis mixture, temperature of the lysis mixture, the size of thetargets, the size of physical barriers in a substrate, the concentrationof the lysis mixture, and the like. For example, the temperature of thelysis reaction can be performed at a temperature of at least 1, 2, 3, 4,5, 10, 15, 20, 25, 30, 35, or 40 C or more. In some embodiments, thetemperature of the lysis reaction can be performed at a temperature ofat most 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, or 40 C or more. Theviscosity of the lysis mixture can be altered by, for example, addingthickening reagents (e.g., glycerol, beads) to slow the rate ofdiffusion. The viscosity of the lysis mixture can be altered by, forexample, adding thinning reagents (e.g., water) to increase the rate ofdiffusion. A substrate can comprise physical barriers (e.g., wells,microwells, microhills) that can alter the rate of diffusion of targetsfrom a sample. The concentration of the lysis mixture can be altered toincrease or decrease the rate of diffusion of targets from a sample. Insome embodiments, the concentration of a lysis mixture can be increasedor decreased by at least 1, 2, 3, 4, 5, 6, 7, 8, or 9 or more fold. Insome embodiments, the concentration of a lysis mixture can be increasedor decreased by at most 1, 2, 3, 4, 5, 6, 7, 8, or 9 or more fold.

The rate of diffusion can be increased or decreased. In someembodiments, the rate of diffusion of a lysis mixture can be increasedor decreased by at least 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more foldcompared to an un-altered lysis mixture. In some embodiments, the rateof diffusion of a lysis mixture can be increased or decreased by at most1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more fold compared to an un-alteredlysis mixture. In some embodiments, the rate of diffusion of a lysismixture can be increased or decreased by at least 10, 20, 30, 40, 50,60, 70, 80, 90 or 100% compared to an un-altered lysis mixture. In someembodiments, the rate of diffusion of a lysis mixture can be increasedor decreased by at least 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100%compared to an un-altered lysis mixture.

Data Analysis and Display Software Data Analysis and Visualization ofSpatial Resolution of Targets

The disclosure provides for methods for estimating the number andposition of targets with stochastic barcoding and digital counting usingspatial labels. The data obtained from the methods of the disclosure canbe visualized on a map. A map of the number and location of targets froma sample can be constructed using information generated using themethods described herein. The map can be used to locate a physicallocation of a target. The map can be used to identify the location ofmultiple targets. The multiple targets can be the same species oftarget, or the multiple targets can be multiple different targets. Forexample a map of a brain can be constructed to show the digital countand location of multiple targets.

The map can be generated from data from a single sample. The map can beconstructed using data from multiple samples, thereby generating acombined map. The map can be constructed with data from tens, hundreds,and/or thousands of samples. A map constructed from multiple samples canshow a distribution of digital counts of targets associated with regionscommon to the multiple samples. For example, replicated assays can bedisplayed on the same map. In some embodiments, at least 1, 2, 3, 4, 5,6, 7, 8, 9, or 10 or more replicates can be displayed (e.g., overlaid)on the same map. In some embodiments, at most 1, 2, 3, 4, 5, 6, 7, 8, 9,or 10 or more replicates can be displayed (e.g., overlaid) on the samemap. The spatial distribution and number of targets can be representedby a variety of statistics.

Combining data from multiple samples can increase the locationalresolution of the combined map. The orientation of multiple samples canbe registered by common landmarks, wherein the individual locationalmeasurements across samples are at least in part non-contiguous. Aparticular example is sectioning a sample using a microtome on one axisand then sectioning a second sample along a different access. Thecombined dataset can give three dimensional spatial locations associatedwith digital counts of targets. Multiplexing the above approach canallow for high resolution three dimensional maps of digital countingstatistics.

In some embodiments, the system comprises computer-readable media thatincludes code for providing data analysis for the sequence datasetsgenerated by performing single cell, stochastic barcoding assays.Examples of data analysis functionality that can be provided by the dataanalysis software include, but are not limited to, (i) algorithms fordecoding/demultiplexing of the sample label, cell label, spatial label,and molecular label, and target sequence data provided by sequencing thestochastic barcode library created in running the assay, (ii) algorithmsfor determining the number of reads per gene per cell, and the number ofunique transcript molecules per gene per cell, based on the data, andcreating summary tables, (iii) statistical analysis of the sequencedata, e.g. for clustering of cells by gene expression data, or forpredicting confidence intervals for determinations of the number oftranscript molecules per gene per cell, etc., (iv) algorithms foridentifying sub-populations of rare cells, for example, using principalcomponent analysis, hierarchical clustering, k-mean clustering,self-organizing maps, neural networks etc., (v) sequence alignmentcapabilities for alignment of gene sequence data with known referencesequences and detection of mutation, polymorphic markers and splicevariants, and (vi) automated clustering of molecular labels tocompensate for amplification or sequencing errors. In some embodiments,commercially-available software can be used to perform all or a portionof the data analysis, for example, the Seven Bridges(https://www.sbgenomics.com/) software can be used to compile tables ofthe number of copies of one or more genes occurring in each cell for theentire collection of cells. In some embodiments, the data analysissoftware can include options for outputting the sequencing results inuseful graphical formats, e.g. heatmaps that indicate the number ofcopies of one or more genes occurring in each cell of a collection ofcells. In some embodiments, the data analysis software can furthercomprise algorithms for extracting biological meaning from thesequencing results, for example, by correlating the number of copies ofone or more genes occurring in each cell of a collection of cells with atype of cell, a type of rare cell, or a cell derived from a subjecthaving a specific disease or condition. In some embodiment, the dataanalysis software can further comprise algorithms for comparingpopulations of cells across different biological samples.

In some embodiments all of the data analysis functionality can bepackaged within a single software package. In some embodiments, thecomplete set of data analysis capabilities can comprise a suite ofsoftware packages. In some embodiments, the data analysis software canbe a standalone package that is made available to users independently ofthe assay instrument system. In some embodiments, the software can beweb-based, and can allow users to share data.

System Processors and Networks

In general, the computer or processor included in the presentlydisclosed instrument systems, can be further understood as a logicalapparatus that can read instructions from media or a network port, whichcan optionally be connected to server having fixed media. The system caninclude a CPU, disk drives, optional input devices such as keyboard ormouse and optional monitor. Data communication can be achieved throughthe indicated communication medium to a server at a local or a remotelocation. The communication medium can include any means of transmittingor receiving data. For example, the communication medium can be anetwork connection, a wireless connection or an internet connection.Such a connection can provide for communication over the World Wide Web.It is envisioned that data relating to the present disclosure can betransmitted over such networks or connections for reception or review bya party.

An exemplary embodiment of a first example architecture of a computersystem can be used in connection with example embodiments of the presentdisclosure. The example computer system can include a processor forprocessing instructions. Non-limiting examples of processors include:Intel Xeon™ processor, AMD Opteron™ processor, Samsung 32-bit RISC ARM1176JZ(F)-S v1.0™ processor, ARM Cortex-A8 Samsung S5PC100™ processor,ARM Cortex-A8 Apple A4™ processor, Marvell PXA 930™ processor, or afunctionally-equivalent processor. Multiple threads of execution can beused for parallel processing. In some embodiments, multiple processorsor processors with multiple cores can also be used, whether in a singlecomputer system, in a cluster, or distributed across systems over anetwork comprising a plurality of computers, cell phones, or personaldata assistant devices.

A high speed cache can be connected to, or incorporated in, theprocessor to provide a high speed memory for instructions or data thathave been recently, or are frequently, used by processor. The processorcan be connected to a north bridge by a processor bus. The north bridgeis connected to random access memory (RAM) by a memory busand managesaccess to the RAM by the processor. The north bridge can also beconnected to a south bridge by a chipset bus. The south bridge is, inturn, connected to a peripheral bus. The peripheral bus can be, forexample, PCI, PCI-X, PCI Express, or other peripheral bus. The northbridge and south bridge are often referred to as a processor chipset andmanage data transfer between the processor, RAM, and peripheralcomponents on the peripheral bus. In some alternative architectures, thefunctionality of the north bridge can be incorporated into the processorinstead of using a separate north bridge chip.

The system can include an accelerator card attached to the peripheralbus. The accelerator can include field programmable gate arrays (FPGAs)or other hardware for accelerating certain processing. For example, anaccelerator can be used for adaptive data restructuring or to evaluatealgebraic expressions used in extended set processing.

Software and data can be stored in external storage and can be loadedinto RAM or cache for use by the processor. The system includes anoperating system for managing system resources; non-limiting examples ofoperating systems include: Linux, Windows™, MACOS™, BlackBerry OS™,iOS™, and other functionally-equivalent operating systems, as well asapplication software running on top of the operating system for managingdata storage and optimization in accordance with example embodiments ofthe present invention.

For example, system also includes network interface cards (NICs) andconnected to the peripheral bus for providing network interfaces toexternal storage, such as Network Attached Storage (NAS) and othercomputer systems that can be used for distributed parallel processing.

An exemplary diagram of a network can comprise a plurality of computersystems, a plurality of cell phones and personal data assistants, andNetwork Attached Storage (NAS). In example embodiments, systems canmanage data storage and optimize data access for data stored in NetworkAttached Storage (NAS). A mathematical model can be used for the dataand be evaluated using distributed parallel processing across computersystems, and cell phone and personal data assistant systems. Computersystems, and cell phone and personal data assistant systems can alsoprovide parallel processing for adaptive data restructuring of the datastored in Network Attached Storage (NAS). A wide variety of othercomputer architectures and systems can be used in conjunction with thevarious embodiments of the present invention. For example, a bladeserver can be used to provide parallel processing. Processor blades canbe connected through a back plane to provide parallel processing.Storage can also be connected to the back plane or as Network AttachedStorage (NAS) through a separate network interface.

In some embodiments, processors can maintain separate memory spaces andtransmit data through network interfaces, back plane or other connectorsfor parallel processing by other processors. In other embodiments, someor all of the processors can use a shared virtual address memory space.

An exemplary block diagram of a multiprocessor computer system cancomprise a shared virtual address memory space in accordance with anexample embodiment. In some embodiments, the system can include aplurality of processors that can access a shared memory subsystem. Insome embodiments, the system can incorporate a plurality of programmablehardware memory algorithm processors (MAPs) in the memory subsystem.Each MAP can comprise a memory and one or more field programmable gatearrays (FPGAs). The MAP can provide a configurable functional unit andparticular algorithms or portions of algorithms can be provided to theFPGAs for processing in close coordination with a respective processor.For example, the MAPs can be used to evaluate algebraic expressionsregarding the data model and to perform adaptive data restructuring inexample embodiments. In this example, each MAP is globally accessible byall of the processors for these purposes. In one configuration, each MAPcan use Direct Memory Access (DMA) to access an associated memory,allowing it to execute tasks independently of, and asynchronously fromthe respective microprocessor. In this configuration, a MAP can feedresults directly to another MAP for pipelining and parallel execution ofalgorithms.

The above computer architectures and systems are examples only, and awide variety of other computer, cell phone, and personal data assistantarchitectures and systems can be used in connection with exampleembodiments, including systems using any combination of generalprocessors, co-processors, FPGAs and other programmable logic devices,system on chips (SOCs), application specific integrated circuits(ASICs), and other processing and logic elements. In some embodiments,all or part of the computer system can be implemented in software orhardware. Any variety of data storage media can be used in connectionwith example embodiments, including random access memory, hard drives,flash memory, tape drives, disk arrays, Network Attached Storage (NAS)and other local or distributed data storage devices and systems.

In some embodiments, the computer subsystem of the present disclosurecan be implemented using software modules executing on any of the aboveor other computer architectures and systems. In other embodiments, thefunctions of the system can be implemented partially or completely infirmware, programmable logic devices such as field programmable gatearrays (FPGAs), system on chips (SOCs), application specific integratedcircuits (ASICs), or other processing and logic elements. For example,the Set Processor and Optimizer can be implemented with hardwareacceleration through the use of a hardware accelerator card, such asaccelerator card.

Kits

Disclosed herein are kits for performing single cell, stochasticbarcoding assays. The kit can comprise one or more substrates (e.g.,microwell array), either as a free-standing substrate (or chip)comprising one or more microwell arrays, or packaged within one or moreflow-cells or cartridges. In some embodiments, the kits comprise one ormore solid support suspensions, wherein the individual solid supportswithin a suspension comprise a plurality of attached stochastic barcodesof the disclosure. In some embodiments, the kits comprise stochasticbarcodes that may not be attached to a solid support. In someembodiments, the kit can further comprise a mechanical fixture formounting a free-standing substrate in order to create reaction wellsthat facilitate the pipetting of samples and reagents into thesubstrate. The kit can further comprise reagents, e.g. lysis buffers,rinse buffers, or hybridization buffers, for performing the stochasticbarcoding assay. In some embodiments, the kit further comprise reagents(e.g. enzymes, primers, dNTPs, NTPs, RNAse inhibitors, or buffers) forperforming nucleic acid extension reactions, for example, reversetranscription reactions and primer extension reactions. In someembodiments, the kit further comprises reagents (e.g. enzymes, universalprimers, sequencing primers, target-specific primers, or buffers) forperforming amplification reactions to prepare sequencing libraries. Insome embodiments, the kit can comprise a ligase, a transposase, areverse transcriptase, a DNA polymerase, an RNase, an exonuclease, orany combination thereof. In some embodiments, the kit comprises reagentsfor homopolymer tailing of molecules (e.g., a terminal transferaseenzyme, and dNTPs). The kit can comprise reagents for, for example, anyenzymatic cleavage of the disclosure (e.g., ExoI nuclease, restrictionenzyme). In some embodiments, the kit can comprise reagents for adaptorligation (e.g., ligase enzyme, reducing reagent). In some embodiments,the kit can comprise reagents for library preparation (e.g., addition ofsequencing library/flow cell primers) which can include, sequencing/flowcell primers, enzymes for attaching the primers, dNTPs, etc.).

In some embodiments, the kit comprises a whole transcriptomeamplification primer of the disclosure. In some embodiments, the kit cancomprise sequencing library amplification primers of the disclosure. Insome embodiments, the kit can comprise a second strand synthesis primerof the disclosure. For example, the second strand synthesis primer cancomprise a second universal label, a gene specific sequence, a randommultimer sequence, a restriction enzyme cleavage site, and a sequencecomplementary to a homopolymer tail, or any combination thereof. The kitcan comprise any primers of the disclosure (e.g., gene-specific primers,random multimers, sequencing primers, and universal primers).

In some embodiments, the kit can comprise one or more molds, forexample, molds comprising an array of micropillars, for castingsubstrates (e.g., microwell arrays), and one or more solid supports(e.g., bead), wherein the individual beads within a suspension comprisea plurality of attached stochastic barcodes of the disclosure. In someembodiments, the kit can further comprise a material for use in castingsubstrates (e.g. agarose, a hydrogel, PDMS, optical adhesive. and thelike).

In some embodiments, the kit can comprise one or more substrates thatare pre-loaded with solid supports comprising a plurality of attachedstochastic barcodes of the disclosure. In some instances, there can beone solid support per microwell of the substrate. In some embodiments,the plurality of stochastic barcodes can be attached directly to asurface of the substrate, rather than to a solid support. In any ofthese embodiments, the one or more microwell arrays can be provided inthe form of free-standing substrates (or chips), or they can be packedin flow-cells or cartridges.

In some embodiments, the kit can comprise one or more cartridges thatincorporate one or more substrates. In some embodiments, the one or morecartridges can further comprise one or more pre-loaded solid supports,wherein the individual solid supports within a suspension comprise aplurality of attached stochastic barcodes of the disclosure. In someembodiments, the beads can be pre-distributed into the one or moremicrowell arrays of the cartridge. In some embodiments, the beads, inthe form of suspensions, can be pre-loaded and stored within reagentwells of the cartridge. In some embodiments, the one or more cartridgescan further comprise other assay reagents that are pre-loaded and storedwithin reagent reservoirs of the cartridges.

Kits can also include instructions for carrying out one or more of themethods described herein. Instructions included in kits can be affixedto packaging material or can be included as a package insert. While theinstructions are typically written or printed materials they are notlimited to such. Any medium capable of storing such instructions andcommunicating them to an end user is contemplated by the disclosure.Such media can include, but are not limited to, electronic storage media(e.g., magnetic discs, tapes, cartridges, chips, or any combinationthereof), optical media (e.g., CD ROM), RF tags, and the like. As usedherein, the term “instructions” can include the address of an internetsite that provides the instructions.

Systems

Disclosed herein are systems for generating a whole transcriptomeamplification (WTA) product from a plurality of single cells. In someembodiments, the systems disclosed herein can comprise a substratecomprising a plurality of partitions each comprising a single cell and asolid support immobilized with a plurality of nucleic acids. In someembodiments, the systems disclosed herein can comprise one or more solidsupport suspensions, wherein the individual solid supports within asuspension comprise a plurality of attached stochastic barcodes of thedisclosure. In some embodiments, the systems disclosed herein cancomprise stochastic barcodes that may not be attached to a solidsupport. In some embodiments, the systems disclosed herein can furthercomprise a mechanical fixture for mounting a free-standing substrate inorder to create reaction wells that facilitate the pipetting of samplesand reagents into the substrate. In some embodiments, the systemsdisclosed herein further comprise reagents, e.g. lysis buffers, rinsebuffers, or hybridization buffers, for performing the stochasticbarcoding assay. In some embodiments, the systems disclosed herein canfurther comprise reagents (e.g. enzymes, primers, dNTPs, NTPs, RNAseinhibitors, or buffers) for performing nucleic acid extension reactions,for example, reverse transcription reactions and primer extensionreactions. In some embodiments, the systems disclosed herein can furthercomprise reagents (e.g. enzymes, universal primers, sequencing primers,target-specific primers, or buffers) for performing amplificationreactions to prepare sequencing libraries. In some embodiments, thesystems disclosed herein can comprise a ligase, a transposase, a reversetranscriptase, a DNA polymerase, an RNase, an exonuclease, or anycombination thereof.

Devices Flow Cells

The microwell array substrate can be packaged within a flow cell thatprovides for convenient interfacing with the rest of the fluid handlingsystem and facilitates the exchange of fluids, e.g. cell and solidsupport suspensions, lysis buffers, rinse buffers, etc., that aredelivered to the microwell array and/or emulsion droplet. Designfeatures can include: (i) one or more inlet ports for introducing cellsamples, solid support suspensions, or other assay reagents, (ii) one ormore microwell array chambers designed to provide for uniform fillingand efficient fluid-exchange while minimizing back eddies or dead zones,and (iii) one or more outlet ports for delivery of fluids to a samplecollection point or a waste reservoir. The design of the flow cell caninclude a plurality of microarray chambers that interface with aplurality of microwell arrays such that one or more different cellsamples can be processed in parallel. The design of the flow cell canfurther include features for creating uniform flow velocity profiles,i.e. “plug flow”, across the width of the array chamber to provide formore uniform delivery of cells and beads to the microwells, for example,by using a porous barrier located near the chamber inlet and upstream ofthe microwell array as a “flow diffuser”, or by dividing each arraychamber into several subsections that collectively cover the same totalarray area, but through which the divided inlet fluid stream flows inparallel. In some embodiments, the flow cell can enclose or incorporatemore than one microwell array substrate. In some embodiments, theintegrated microwell array/flow cell assembly can constitute a fixedcomponent of the system. In some embodiments, the microwell array/flowcell assembly can be removable from the instrument.

In some embodiments, the dimensions of fluid channels and the arraychamber(s) in flow cell designs are optimized to (i) provide uniformdelivery of cells and beads to the microwell array, and (ii) to minimizesample and reagent consumption. In some embodiments, the width of fluidchannels is between 50 um and 20 mm. In some embodiments, the width offluid channels can be at least 50 um, at least 100 um, at least 200 um,at least 300 um, at least 400 um, at least 500 um, at least 750 um, atleast 1 mm, at least 2.5 mm, at least 5 mm, at least 10 mm, at least 20mm, at least 50 mm, at least 100 mm, or at least 150 mm. In someembodiments, the width of fluid channels can be at most 150 mm, at most100 mm, at most 50 mm, at most 20 mm, at most 10 mm, at most 5 mm, atmost 2.5 mm, at most 1 mm, at most 750 um, at most 500 um, at most 400um, at most 300 um, at most 200 um, at most 100 um, or at most 50 um. Insome embodiments, the width of fluid channels is about 2 mm. The widthof the fluid channels can fall within any range bounded by any of thesevalues (e.g. from about 250 um to about 3 mm).

In some embodiments, the depth of the fluid channels is between 50 umand 2 mm. In other embodiments, the depth of fluid channels can be atleast 50 um, at least 100 um, at least 200 um, at least 300 um, at least400 um, at least 500 um, at least 750 um, at least 1 mm, at least 1.25mm, at least 1.5 mm, at least 1.75 mm, or at least 2 mm. In yet otherembodiments, the depth of fluid channels can at most 2 mm, at most 1.75mm, at most 1.5 mm, at most 1.25 mm, at most 1 mm, at most 750 um, atmost 500 um, at most 400 um, at most 300 um, at most 200 um, at most 100um, or at most 50 um. In one embodiment, the depth of the fluid channelsis about 1 mm. The depth of the fluid channels can fall within any rangebounded by any of these values (e.g. from about 800 um to about 1 mm).

Flow cells can be fabricated using a variety of techniques and materialsknown to those of skill in the art. In some embodiments, the flow cellis fabricated as a separate part and subsequently either mechanicallyclamped or permanently bonded to the microwell array substrate. Examplesof suitable fabrication techniques include conventional machining, CNCmachining, injection molding, 3D printing, alignment and lamination ofone or more layers of laser or die-cut polymer films, or any of a numberof microfabrication techniques such as photolithography and wet chemicaletching, dry etching, deep reactive ion etching, or lasermicromachining. Once the flow cell part has been fabricated it can beattached to the microwell array substrate mechanically, e.g. by clampingit against the microwell array substrate (with or without the use of agasket), or it can be bonded directly to the microwell array substrateusing any of a variety of techniques (depending on the choice ofmaterials used) known to those of skill in the art, for example, throughthe use of anodic bonding, thermal bonding, or any of a variety ofadhesives or adhesive films, including epoxy-based, acrylic-based,silicone-based, UV curable, polyurethane-based, or cyanoacrylate-basedadhesives.

Flow cells can be fabricated using a variety of materials known to thoseof skill in the art. In some embodiments, the choice of material useddepends on the choice of fabrication technique used, and vice versa.Examples of suitable materials include, but are not limited to, silicon,fused-silica, glass, any of a variety of polymers, e.g.polydimethylsiloxane (PDMS; elastomer), polymethylmethacrylate (PMMA),polycarbonate (PC), polypropylene (PP), polyethylene (PE), high densitypolyethylene (HDPE), polyimide, cyclic olefin polymers (COP), cyclicolefin copolymers (COC), polyethylene terephthalate (PET), epoxy resins,metals (e.g. aluminum, stainless steel, copper, nickel, chromium, andtitanium), a non-stick material such as teflon (PTFE), or a combinationof these materials.

Cartridges

In some embodiments of the system, the microwell array, with or withoutan attached flow cell, can be packaged within a consumable cartridgethat interfaces with the instrument system. Design features ofcartridges can include (i) one or more inlet ports for creating fluidconnections with the instrument or manually introducing cell samples,bead suspensions, or other assay reagents into the cartridge, (ii) oneor more bypass channels, i.e. for self-metering of cell samples and beadsuspensions, to avoid overfilling or back flow, (iii) one or moreintegrated microwell array/flow cell assemblies, or one or more chamberswithin which the microarray substrate(s) are positioned, (iv) integratedminiature pumps or other fluid actuation mechanisms for controllingfluid flow through the device, (v) integrated miniature valves (or othercontainment mechanisms) for compartmentalizing pre-loaded reagents (forexample, bead suspensions) or controlling fluid flow through the device,(vi) one or more vents for providing an escape path for trapped air,(vii) one or more sample and reagent waste reservoirs, (viii) one ormore outlet ports for creating fluid connections with the instrument orproviding a processed sample collection point, (ix) mechanical interfacefeatures for reproducibly positioning the removable, consumablecartridge with respect to the instrument system, and for providingaccess so that external magnets can be brought into close proximity withthe microwell array, (x) integrated temperature control components or athermal interface for providing good thermal contact with the instrumentsystem, and (xi) optical interface features, e.g. a transparent window,for use in optical interrogation of the microwell array.

The cartridge can be designed to process more than one sample inparallel. The cartridge can further comprise one or more removablesample collection chamber(s) that are suitable for interfacing withstand-alone PCR thermal cyclers or sequencing instruments. The cartridgeitself can be suitable for interfacing with stand-alone PCR thermalcyclers or sequencing instruments. The term “cartridge” as used in thisdisclosure can be meant to include any assembly of parts which containsthe sample and beads during performance of the assay.

The cartridge can further comprise components that are designed tocreate physical or chemical barriers that prevent diffusion of (orincrease pathlengths and diffusion times for) large molecules in orderto minimize cross-contamination between microwells. Examples of suchbarriers can include, but are not limited to, a pattern of serpentinechannels used for delivery of cells and solid supports (e.g., beads) tothe microwell array, a retractable platen or deformable membrane that ispressed into contact with the surface of the microwell array substrateduring lysis or incubation steps, the use of larger beads, e.g. Sephadexbeads as described previously, to block the openings of the microwells,or the release of an immiscible, hydrophobic fluid from a reservoirwithin the cartridge during lysis or incubation steps, to effectivelyseparate and compartmentalize each microwell in the array.

The dimensions of fluid channels and the array chamber(s) in cartridgedesigns can be optimized to (i) provide uniform delivery of cells andbeads to the microwell array, and (ii) to minimize sample and reagentconsumption. For example, the width of fluid channels can be between 50micrometers and 20 mm. In some embodiments, the width of fluid channelscan be at least 50 micrometers, at least 100 micrometers, at least 200micrometers, at least 300 micrometers, at least 400 micrometers, atleast 500 micrometers, at least 750 micrometers, at least 1 mm, at least2.5 mm, at least 5 mm, at least 10 mm, or at least 20 mm. In someembodiments, the width of fluid channels can at most 20 mm, at most 10mm, at most 5 mm, at most 2.5 mm, at most 1 mm, at most 750 micrometers,at most 500 micrometers, at most 400 micrometers, at most 300micrometers, at most 200 micrometers, at most 100 micrometers, or atmost 50 micrometers. In some embodiments, the width of fluid channelscan be about 2 mm. In some embodiments, the width of the fluid channelscan fall within any range bounded by any of these values (e.g. fromabout 250 um to about 3 mm).

The fluid channels in the cartridge can have a depth. The depth of thefluid channels in cartridge designs can be, for example, between 50micrometers and 2 mm. In some embodiments, the depth of fluid channelscan be at least 50 micrometers, at least 100 micrometers, at least 200micrometers, at least 300 micrometers, at least 400 micrometers, atleast 500 micrometers, at least 750 micrometers, at least 1 mm, at least1.25 mm, at least 1.5 mm, at least 1.75 mm, or at least 2 mm. In someembodiments, the depth of fluid channels can at most 2 mm, at most 1.75mm, at most 1.5 mm, at most 1.25 mm, at most 1 mm, at most 750micrometers, at most 500 micrometers, at most 400 micrometers, at most300 micrometers, at most 200 micrometers, at most 100 micrometers, or atmost 50 micrometers. In some embodiments, the depth of the fluidchannels can be about 1 mm. In some embodiments, the depth of the fluidchannels can fall within any range bounded by any of these values (e.g.from about 800 micrometers to about 1 mm).

Cartridges can be fabricated using a variety of techniques and materialsknown to those of skill in the art. In some embodiments, the cartridgesare fabricated as a series of separate component parts and subsequentlyassembled using any of a number of mechanical assembly or bondingtechniques. Examples of suitable fabrication techniques include, but arenot limited to, conventional machining, CNC machining, injectionmolding, thermoforming, and 3D printing. Once the cartridge componentshave been fabricated they can be mechanically assembled using screws,clips, and the like, or permanently bonded using any of a variety oftechniques (depending on the choice of materials used), for example,through the use of thermal bonding/welding or any of a variety ofadhesives or adhesive films, including epoxy-based, acrylic-based,silicone-based, UV curable, polyurethane-based, or cyanoacrylate-basedadhesives.

Cartridge components can be fabricated using any of a number of suitablematerials, including but not limited to silicon, fused-silica, glass,any of a variety of polymers, e.g. polydimethylsiloxane (PDMS;elastomer), polymethylmethacrylate (PMMA), polycarbonate (PC),polypropylene (PP), polyethylene (PE), high density polyethylene (HDPE),polyimide, cyclic olefin polymers (COP), cyclic olefin copolymers (COC),polyethylene terephthalate (PET), epoxy resins, non-stick materials suchas teflon (PTFE), metals (e.g. aluminum, stainless steel, copper,nickel, chromium, and titanium), or any combination thereof.

The inlet and outlet features of the cartridge can be designed toprovide convenient and leak-proof fluid connections with the instrument,or can serve as open reservoirs for manual pipetting of samples andreagents into or out of the cartridge. Examples of convenient mechanicaldesigns for the inlet and outlet port connectors can include, but arenot limited to, threaded connectors, Luer lock connectors, Luer slip or“slip tip” connectors, press fit connectors, and the like. The inlet andoutlet ports of the cartridge can further comprise caps, spring-loadedcovers or closures, or polymer membranes that can be opened or puncturedwhen the cartridge is positioned in the instrument, and which serve toprevent contamination of internal cartridge surfaces during storage orwhich prevent fluids from spilling when the cartridge is removed fromthe instrument. The one or more outlet ports of the cartridge canfurther comprise a removable sample collection chamber that is suitablefor interfacing with stand-alone PCR thermal cyclers or sequencinginstruments.

The cartridge can include, for example, integrated miniature pumps orother fluid actuation mechanisms for control of fluid flow through thedevice. Examples of suitable miniature pumps or fluid actuationmechanisms can include, but are not limited to, electromechanically- orpneumatically-actuated miniature syringe or plunger mechanisms, membranediaphragm pumps actuated pneumatically or by an external piston,pneumatically-actuated reagent pouches or bladders, or electro-osmoticpumps.

The cartridge can include, for example, miniature valves forcompartmentalizing pre-loaded reagents or controlling fluid flow throughthe device. Examples of suitable miniature valves can include, but arenot limited to, one-shot “valves” fabricated using wax or polymer plugsthat can be melted or dissolved, or polymer membranes that can bepunctured; pinch valves constructed using a deformable membrane andpneumatic, magnetic, electromagnetic, or electromechanical (solenoid)actuation, one-way valves constructed using deformable membrane flaps,and miniature gate valves.

The cartridge can include, for example, vents for providing an escapepath for trapped air. Vents can be constructed according to a variety oftechniques, for example, using a porous plug of polydimethylsiloxane(PDMS) or other hydrophobic material that allows for capillary wickingof air but blocks penetration by water.

The mechanical interface features of the cartridge can provide foreasily removable but highly precise and repeatable positioning of thecartridge relative to the instrument system. Suitable mechanicalinterface features can include, but are not limited to, alignment pins,alignment guides, mechanical stops, and the like. The mechanical designfeatures can include relief features for bringing external apparatus,e.g. magnets or optical components, into close proximity with themicrowell array chamber.

In some embodiments, the cartridge can include temperature controlcomponents or thermal interface features for mating to externaltemperature control modules. Examples of suitable temperature controlelements can include, but are not limited to, resistive heatingelements, miniature infrared-emitting light sources, Peltier heating orcooling devices, heat sinks, thermistors, thermocouples, and the like.Thermal interface features can be fabricated from materials that aregood thermal conductors (e.g. copper, gold, silver, etc.) and cancomprise one or more flat surfaces capable of making good thermalcontact with external heating blocks or cooling blocks.

In some embodiments, the cartridge can include optical interfacefeatures for use in optical imaging or spectroscopic interrogation ofthe microwell array. The cartridge can include an optically transparentwindow, e.g. the microwell substrate itself or the side of the flow cellor microarray chamber that is opposite the microwell array, fabricatedfrom a material that meets the spectral requirements for the imaging orspectroscopic technique used to probe the microwell array. Examples ofsuitable optical window materials can include, but are not limited to,glass, fused-silica, polymethylmethacrylate (PMMA), polycarbonate (PC),cyclic olefin polymers (COP), or cyclic olefin copolymers (COC).

While preferred embodiments of the present invention have been shown anddescribed herein, it will be obvious to those skilled in the art thatsuch embodiments are provided by way of example only. Numerousvariations, changes, and substitutions will now occur to those skilledin the art without departing from the invention. It should be understoodthat various alternatives to the embodiments of the invention describedherein can be employed in practicing the invention. It is intended thatthe following claims define the scope of the invention and that methodsand structures within the scope of these claims and their equivalents becovered thereby.

EXAMPLES

The following examples are offered to illustrate but not to limit theinvention.

In order to facilitate understanding, the specific embodiments areprovided to help interpret the technical proposal, that is, theseembodiments are only for illustrative purposes, but not in any way tolimit the scope of the invention. Unless otherwise specified,embodiments do not indicate the specific conditions, are in accordancewith the conventional conditions or the manufacturer's recommendedconditions.

Example 1: First Strand cDNA Synthesis Using an Oligo dT Primer with aUniversal Sequence at the 5′ End

This example describes the adaptor ligation method of the disclosure.First strand cDNA synthesis using an oligo dT primer with a universalsequence at its 5′ end was performed. Following first strand synthesis,second strand cDNA was synthesized by the Gubler-Hoffman method (DNAPoll, RNaseH, and E. coli DNA Ligase). Following second strandsynthesis, the ends of the cDNA were optionally further processed (i.e.blunting or dA-tailing), and the 3′ ends of the cDNA is ligated to a 5′phosphorylated adapter with the same or different universal primingsequence. Because the adapter was partially single stranded and 5′phosphorylated, it may only ligate to the cDNA with negligibleself-ligation. The cDNA was the amplified by PCR using the universalsequences from the original RT primer and ligated adapter as primingsites. The priming sites could be the same sequence, or the first andsecond universal primer sequences of the disclosure, and could involvesemi-suppressive PCR to prevent artifact formation.

In a specific exemplary embodiment, the method was used to generateamplified cDNA from individual samples ranging from 1-100 ng total RNA,and from 96 1 ng samples simultaneously. 1 ng, 10 ng, 100 ng Human BrainReference RNA (HBRR) spiked with 10,000 barcoded control RNAs perreaction was used as the template for first strand cDNA synthesis using200 nM RT primer consisting of a pool of all 96 RT primers from thePrecise™ assay. Second strand cDNA was generated using the NEBNext2^(nd) strand cDNA module according to manufacturer's protocol. Fiveminutes before the end of the reaction, 3U of T4 DNA polymerase is addedto ensure blunt cDNA, the reaction was terminated by addition of EDTA,and purified using a 1.8× ratio of AmpureXP beads. 300 nM phosphorylatedadapter CBO102/103 (annealed /5Phos/GCGATCGCGATCGGAAGAGCACACGTCTGA (SEQID NO: 1) and CTTCCGATCGCGATCGC (SEQ ID NO: 2)) was ligated to theblunted cDNA using the NEBNext Quick ligation module. Ligated cDNA waspurified with a 1× volume of AmpureXP beads and the eluted product wasamplified using 2×Q5 HotStart Mastermix (NEB) and 500 nM of the primerCBO23 (sequence: TCAGACGTGTGCTCTTCC (SEQ ID NO: 3)). 5 μl of theresulting PCR product was resolved on a 1% TBE Agarose gel (See FIG. 6A,Lanes: 1) NEB 2-log ladder 2) 100 ng HBRR, 3) 10 ng HBRR 4) 1 ng HBRR).Efficiency of second strand synthesis and ligation was determined by themass yield of WTA product as determined by Nanodrop (See FIG. 6C), aswell as counting control spike-in molecules (kan, dap, phe) with thePixel™ instrument (See FIG. 6B).

Example 2: First Strand cDNA Synthesis Using an Oligo dT Primer with aUniversal Sequence at the 5′ End

1 ng Human Brain Reference RNA or Human Universal Reference RNA wasadded to each well of a 96 well plate (e.g., Precise™ Assay plate).First strand cDNA was synthesized according to the manufacturer'sprotocol (e.g., 42 C for 30 min, followed by 80 C for 5 min, hold at 4C) and purified using a 1× ratio of AmpureXP beads, with elution intowater.

Second strand cDNA was generated using the NEBNext 2^(nd) strand cDNAmodule according to manufacturer's protocol (e.g., combination ofbuffers, incubation of 2.5 hours at 16 C). Five minutes before the endof the reaction, 3U of T4 DNA polymerase was added to ensure blunt cDNA,the reaction is terminated by addition of EDTA, and purified using a1.8× ratio of AmpureXP beads, with elution into water. 300 nMphosphorylated adapter CBO105/106 (annealed/5Phos/GCGATCGCGGAAGAGCACACGTCTGA (SEQ ID NO: 4) and GCTCTTCCGCGATCGC(SEQ ID NO: 5)) was ligated to the blunted cDNA using the NEBNext Quickligation module by incubation at 30 min at room temperature. LigatedcDNA was purified with a 1× volume of AmpureXP beads and the elutedproduct is amplified using 2×Q5 HotStart Mastermix (NEB) and 500 nM ofthe primer CBO23 (sequence: TCAGACGTGTGCTCTTCC (SEQ ID NO: 6)).Amplification comprised one cycle of 98 C at 30 seconds, 15 cycles of 98C for 20 s, 65 C for 15 s, 72 C for 3 minutes, and one cycle of 72 C for5 min. The size distribution of the WTA product was determined byAgilent Bioanalyzer. To determine the size distribution of the WTAproduct 30 ng/μ1 WTA product from 96×1 ng WTA plate was diluted 1:10 andrun on an Agilent Bioanalyzer high sensitivity assay. FIG. 7 shows theWTA size distribution, which ranges from ˜200 bp-3 kb, with the peak at˜1.2 kb.

The overall efficiency of second strand synthesis and adaptor ligationwas measured by counting the 4320 control molecules with Pixel™, asshown in FIG. 8A. Following the 96×1 ng WTA as described above, Dap andPhe plate spike ins were quantified by Pixel™. The Pixel™ counts werethen compared to the total number of spike-in molecules in the plate,4320, to determine overall reaction efficiency.

Example 3: Second Strand Synthesis with RNaseH and a Strand DisplacingPolymerase

The method disclosed in this example was used to amplify cDNA from 10 ngtotal RNA. cDNA is synthesized according to the method of Example 1.Second strand synthesis was performed using RNaseH and astrand-displacing polymerase or mixture of polymerases at least one ofwhich has strand displacement activity. This resulted in the generationof many overlapping second-strand cDNAs biased towards the 3′ of theoriginal RNA and containing the complement to the universal primingsequence at their 3′ ends. After synthesis, the RNA primer portion ofthe second strands was optionally removed by RNase or alkali treatment.This can aid full length extension by DNA polymerases. These secondstrand cDNAs were primed with a universal primer and extended togenerate double stranded molecules. These double stranded molecules wereligated and amplified as described in Example 1. Using strand displacingpolymerase can amplify the signal amplification by generating severalsecond strand cDNAs for every first strand.

In a specific exemplary embodiment, 10 ng Human Brain Reference RNA(HBRR) spiked with 10,000 barcoded control RNAs per reaction was used asthe template for first strand cDNA synthesis using 200 nM RT primerconsisting of a pool of all 96 RT primers from the Precise™ assay.Second strand synthesis was performed using 18U Klenow(Exo−) and 15URNaseH for 2 hrs at 16° C., followed by 30 min at 37° C., either done ineither NEB 2^(nd) strand buffer or ThermoPol buffer+200 nM dNTPs). Theenzymes were heat inactivated, or the reaction is purified using 1.8×AmpureXP beads and eluted cDNA added to 1× ThermoPol buffer+dNTPs. 200nM primer CBO17 (sequence: TCAGACGTGTGCTCTTCCGAT (SEQ ID NO: 7)) and 5Uof Taq DNA polymerase was added and the reactions were incubated at 94°C. for 3 min, 55° C. 30 sec, and 72° C. for 40 min to generate thecomplementary strand and A-tail the product. Reactions were purifiedusing 1.8× ratio AmpureXP beads. 300 nM phosphorylated adapterCBO105/107 (annealed /5Phos/GCGATCGCGGAAGAGCACACGTCTGA (SEQ ID NO: 8)and GCTCTTCCGCGATCGC*T (SEQ ID NO: 9)) was ligated to the tailed cDNAusing the NEBNext Quick ligation module. Ligated cDNA was purified witha 1× volume of AmpureXP beads and the eluted product is amplified using2×Q5 HotStart Mastermix (NEB) and 500 nM of the primer CBO23 (sequence:TCAGACGTGTGCTCTTCC (SEQ ID NO: 10)). 5 μl of the resulting PCR productwas resolved on a 1% TBE Agarose gel (See FIG. 9A, Lanes: 1) NEB 2-logladder 2) 10 ng HBRR second strand synthesis and Taq extension done inNEB second strand buffer, 3) 10 ng HBRR second strand synthesis and Taqextension done in NEB ThermoPol buffer 4) 10 ng HBRR second strandsynthesis done in NEB second strand buffer followed by purification andTaq extension in ThermoPol buffer).

Efficiency of second strand synthesis was determined by the mass yieldof WTA product (See FIG. 9B), as well as counting control spike-inmolecules with the Pixel instrument (See FIG. 8B).

Example 4: Homopolymer Tailing

This example describes the homopolymer tailing method of the disclosure.96 10 μl reverse transcription reactions each made up of 1 ng lymphocytetotal RNA, 12.5 nM RT primers (e.g., comprising stochastic barcodes ofthe disclosure) pool, 500 μM dNTPs, 1× Protoscript buffer, 40UProtoscriptII reverse transcriptase (NEB), and 4U Murine Rnase Inhibitor(NEB) were carried out at 42° C. for 30 minutes and heat inactivated at80° C. for 5 min.

Reactions were combined into eight tubes and purified using a 1× ratioof AmpureXP beads. cDNA was eluted in 20 μl ExoI digestion mix (1×CutSmart buffer, 20 ExonucleaseI (NEB)) and incubated at 37° C. for 30minutes followed by inactivation at 80° C. for 20 minutes.

5 μl Tailing mix (1× CutSmart buffer, 1.25 mM dATP, 12.5 μM ddATP, 10UTerminal transferase (NEB) 1U RNaseH(NEB)) was added to each reactionand incubated at 37° C. for 15 minutes followed by inactivation at 75°C. for 10 minutes.

25 μl PCR Mastermix (1×Q5 buffer, 400 μM dNTPs, 100 nM CBO16TCAGACGTGTGCTCTTCCGATCTgcgatcgcTTTTTTTTTTTTTTTTTTTTTTTT (SEQ ID NO: 11),1.76 μM CBO20 TCAGACGTGTGCTCTTCCGAT, 1U Q5 HotStart Polymerase (NEB))was added to each reaction and thermocycled with the program: 98° C. 2min, 45° C. 30 s, 72° C. 2 min, 15× of (98° C. 10 s, 65° C. 15 s, 72° C.1.5 min), and 72° C. 1.5 min. PCR reactions were pooled together andcleaned with 0.7× ratio of AmpureXP beads and eluted in 50 μl 10 mM TrispH8.0, 0.05% Tween-20. The total reaction yield was ˜1.5 μg.

The WTA product (e.g., quasi-symmetric stochastically barcoded nucleicacid) was diluted and run on the high sensitivity DNA assay of anAgilent bioanalyzer as shown in FIG. 10.

Example 5: Library Generation Protocol Using Homopolymer Tailing

This example describes library generation of the WTA product from thehomopolymer tailing method (e.g., described in Example 4). 100 ng of WTAproduct was digested in 20 μl with 10U AsiSI (NEB) in 1× CutSmart Bufferfor at 37° C. for 30 min. Reactions were heated to 95° C. for 2 min, andthen placed on ice for 5 min.

30 μl Random priming mix (1× CutSmart buffer, 33304 dNTPs, 1.66 μM CBO12CCCTACACGACGCTCTTCCGATCTNNNNNN(SEQ ID NO: 12), 5U Klenow(exo−) (NEB))was added to each reaction and incubated at 37° C. for 30 min.

Volume was adjusted to 150 μl and bound to 120 μl AmpureXP beads for 5min. Beads were bound to a magnet and the supernatant was transferred toa new tube and the beads discarded. The supernatant was then purifiedwith 60 μl Ampure XP beads and eluted in 241 water. 28 μl PCR mix wasadded to each reaction (1.79×Q5 buffer, 357 nM dNTP, 800 nM CBO32AATGATACGGCGACCACCGAGATCTACACTATAGCCTACACTCTTTCCCTACACGACGC TCTTCCGATC*T(SEQ ID NO: 13), 800 nM CBO33CAAGCAGAAGACGGCATACGAGATCGAGTAATGTGACTGGAGTTCAGACGTGTGCTCT TCCGATC*T(SEQ ID NO: 14), and 1U Q5 HotStart DNA Polymerase (NEB))) was added toeach reaction and thermocycled with the program: 98° C. 30 s, 15× of(98° C. 10 s, 65° C. 15 s, 72° C. 20 s) and 72° C. 2 min.

PCR reactions were cleaned with 1× ratio of AmpureXP beads and eluted in30 μl 10 mM Tris pH8.0, 0.05% Tween-20. The final product concentrationwas ˜20 ng/μ1.

Sequencing library was diluted and run on the high sensitivity DNA assayof an Agilent Bioanalyzer as shown in FIG. 11.

Example 6: Removal of Restriction Enzyme Site

This example compares various methods to remove the restriction enzymebinding site from the quasi-symmetric stochastically barcoded nucleicacid product of the disclosure, thereby breaking the symmetry of themolecule.

Libraries were created using different protocols as described herein toremove the tagging sequence from libraries before, during, or afterlibrary construction. FIG. 12 shows a comparison of the demultiplexingresults of the sequencing runs from WTA samples made with differentrestriction enzyme binding site removal protocols. Group 1 of FIG. 12were samples made by the homopolymer tailing and restriction digestionprotocol of the disclosure. The first bar of the triplicate indicatesthe percent of reads that mapped to a valid stochastic barcode (e.g., astochastic barcode used in the assay). The second bar of the triplicateindicates sequences unique to the tagging primer/ligated adapter (e.g.,the restriction cleavage site directly after the WTA primer annealingsite). This indicates reads that were on the 5′ end.

The third bar of the triplicate indicates the percent of reads that wereof unknown origin. Group 2 of FIG. 12 was a WTA sample prepared withhomopolymer tailing and digested with 10 U of AsiSI. Group 3 of FIG. 12was a WTA sample prepared with homopolymer tailing with 10 U of theAsiSI restriction enzyme included during the random priming step afterthe WTA amplification. Group 4 of FIG. 12 was a WTA sample prepared withhomopolymer tailing with 10 U of the AsiSI restriction enzyme includedduring the random priming step after the WTA amplification and was heatinactivated. Group 5 of FIG. 12 was a WTA sample prepared with analternate tagging primer of the disclosure (CBO10TCAGACGTGTGCTCTTCCGATgcgatcgcTTTTTTTTTTTTTTTTTTTTTTTT (SEQ ID NO: 15)).The primer has a 2 nt mismatch to the 3′ most nucleotides of the WTAamplification primer. The library was constructed omitting AsiSIdigestion.

The base distribution of the sequencing reads from the library preparedin FIG. 14 is shown in FIGS. 13A and 13B. The bases are distributedevenly indicating there isn't any contaminating AsiSI sequence from theWTA primer (e.g., second universal primer). The library from FIG. 14 wasmade from a combination of using CBO10 and simultaneous AsiSI treatmentlike group 3.

Example 7: Combination of Adaptor Addition Methods and SimultaneousRestriction Enzyme Treatment

This example shows the percentage of reads based on the first 8 bp ofthe sequence mapping to either a valid stochastic barcode, the WTAprimer (e.g., second universal primer) that comprises the restrictionenzyme binding site (i.e., immediately downstream of the read2sequence), or an unknown sequence.

WTA products were generated either by the homopolymer tailing method, or2^(nd) strand/ligation protocols. The homopolymer protocol utilized atagging primer with 2 nt of mismatch to the 3′ end of the libraryamplification primer, while the ligation based method described hereinhad 5 nt of mismatch.

Libraries were generated by random priming with Klenow(Exo−), but withthe inclusion of 10U AsiSI during the primer extension. Libraries weresequenced with paired end chemistry on an Illumina MiSeq.

As shown in FIG. 14, combining enzyme digestion and 3′ mismatches to thelibrary amplification primer perform the best, with more mismatchesbeing better.

Example 8: Adaptor Ligation Method on Beads

This example illustrates the method and data for ligating an adaptor ofthe disclosure to the second strand of a barcoded cDNA on a bead. Themethod comprises:

Bead Preparation: 40 μl 2.5M beads/ml Resolve beads were put in a 1.5 mltube. The beads were bound to a magnet and supernatant was removed. Thebeads were washed 2× with 200 μl RNase-free water. The beads wereresuspended in 21.5 μl Annealing mix and incubated 3 min @ 65° C. in theheat block with occasional vortexing. The beads were placed on ice.

1^(st) Strand Synthesis: 18.5 μl RT Mastermix was added to the beads.The reaction mixture was incubated at 42° C. for 10 min in thethermomixer and 80° C. for 5 min on a heat block. The beads were boundto a magnet to remove the supernatant.

2^(nd) Strand Synthesis: 80 μl 2nd Strand Synthesis Mix was added to thebeads. The reaction mixture was incubated 2.5 hrs at 16° C. in thethermomixer and 1 μl 3U/μl T4 DNA Polymerase was added and incubated 5min at 16° C. (blunt the cDNA). The reaction mixture was transferred toice, and 5 μl 0.5M EDTA was added and mixed to stop all enzymaticactivity followed by 1×100 μl 10 mM Tris Ph8.0.

Adaptor Ligation: 50 μl Ligation Mix was added to each tube andincubated 30 min @ room temp on the rotator. The beads were bound to amagnet and washed 2×200 μl Tris-Tween and resuspended in 50 μlTris-Tween and store at 4° C.

WTA Amplification: WTA PCR reactions were set up with 1 μl resuspendedbeads in WTA Mastermix and PCR was conducted.

Annealing mix 1 1 X Reagent 1 1 μl 1 μg/μl Ambion Human Brain RNA (notHBRR) 1 1 μl 1 pg/μl spike-in 19.5 19.5 μl water 21.5 21.5 μl Total

RT Mastermix 1 1 X Reagent 8 8 μl 5X Protoscipt buffer 4 4 μl 1 mg/mlBSA 2 2 μl 10 mM dNTP 2 2 μl 100 mM DTT 2 2 μl ProtoscriptII 0.5 0.5 μlMurine Rnasin 18.5 18.5 μl Total

2nd Strand Synthesis Mix 1 1 X Reagent 60 60 μl water 8 8 μl 1 mg/ml BSA8 8 μl 10X 2nd Strand Synthesis Buffer 4 4 μl 2nd Strand Enzyme Mix 8080 μl Total

Ligation Mix 1 1 X Reagent 32 32 μl water 5 5 μl 1 mg/ml BSA 10 10 μl 5XQuick Ligation Reaction Buffer 1 1 μl Quick T4 DNA Ligase 2 2 μl 5 μMAnealed CBO123/106 50 50 μl Total

WTA Mastermix 1 2 X Reagent 21.5 43 μl Water 1 2 μl Ligated ResolveBeads 2.5 5 μl 30 μM CBO40 25 50 μl 2X Q5 HotStart Mastermix 50 100 μlTotal

Cycling Conditions for PCR 98° C. 30 s 98° C. 10 s 20X 65° C. 15 s 72°C. 3 min 72° C. 5 min  4° C. hold

The results of the above method are shown in FIG. 22. The gel depictedon the left shows the WTA amplification product. The Q5 wash and KAPAwashed lanes show the WTA amplification product, as the smear representsthe amplified transcriptome. Removal of 2^(nd) strand enzymes by washingwas much more effective than heat inactivation. No major difference wasseen between the polymerases.

The samples shown on the left were quantified for efficiency using thePixel™ method as described herein.

Example 9: Characterization of the Suppressive Adaptor

This example shows the characterization of the suppression capabilitiesof different adaptors. The method followed the adaptor ligation WTAprotocol (Example 8) with either the high or low suppression adapter andthen amplified with varying concentrations of WTA primer. The methodcomprises:

Bead Preparation: 40 μl 2.5M beads/ml Resolve beads were put in a 1.5 mltube. The beads were bound to a magnet and supernatant was removed. Thebeads were washed 2× with 200 μl RNase-free water. The beads wereresuspended in 21.5 μl Annealing mix and incubated 3 min @ 65° C. in theheat block with occasional vortexing. The beads were placed on ice.

1^(st) Strand Synthesis: 18.5 μl RT Mastermix was added to the beads.The reaction mixture was incubated at 42° C. for 10 min in thethermomixer and 80° C. for 5 min on a heat block. The beads were boundto a magnet to remove the supernatant.

2^(nd) Strand Synthesis: 80 μl 2 nd Strand Synthesis Mix was added tothe beads. The reaction mixture was incubated 2.5 hrs at 16° C. in thethermomixer and 1 μl 3U/μ1 T4 DNA Polymerase was added and incubated 5min at 16° C. (blunt the cDNA). The reaction mixture was transferred toice, and 5 μl 0.5M EDTA was added and mixed to stop all enzymaticactivity followed by 1×100 μl 10 mM Tris Ph8.0. The cleaned 2nd strandreaction was split into two tubes.

Adaptor Ligation: 50 μl Ligation Mix was added to each tube andincubated 30 min @ room temp on the rotator. The beads were bound to amagnet and washed 2×200 μl Tris-Tween and resuspended in 50 μlTris-Tween and store at 4° C.

WTA Amplification: WTA PCR reactions were set up with 1 μl resuspendedbeads in WTA Mastermix and PCR was conducted.

Annealing mix 1 1 X Reagent 1 1 μl 1 μg/μl Ambion Human Brain RNA (notHBRR) 1 1 μl 1 pg/μl spike-in 19.5 19.5 μl water 21.5 21.5 μl Total

RT Mastermix 1 1 X Reagent 8 8 μl 5X Protoscipt buffer 4 4 μl 1 mg/mlBSA 2 2 μl 10 mM dNTP 2 2 μl 100 mM DTT 2 2 μl ProtoscriptII 0.5 0.5 μlMurine Rnasin 18.5 18.5 μl Total

2nd Strand Synthesis Mix 1 1 X Reagent 60 60 μl water 8 8 μl 1 mg/ml BSA8 8 μl 10X 2nd Strand Synthesis Buffer 4 4 μl 2nd Strand Enzyme Mix 8080 μl Total

Ligation Mix 1 1 X Reagent 32 32 μl water 5 5 μl 1 mg/ml BSA 10 10 μl 5XQuick Ligation Reaction Buffer 1 1 μl Quick T4 DNA Ligase 2 2 μl 5 μMAnealed CBO122/103 or CBO 123/105 50 50 μl Total

WTA Mastermix 1 2 X Reagent 21.5 43 μl Water 1 2 μl Ligated ResolveBeads 2.5 5 μl 30 μM, 10 μM, or 5 μM CBO40 25 50 μl 2X Q5 HotStartMastermix 50 100 μl Total

Cycling Conditions for PCR 98° C. 30 s 98° C. 10 s 20X 65° C. 15 s 72°C. 3 min 72° C. 5 min  4° C. hold

The results of the method are shown in FIG. 23. H refers to highsuppression adaptor. L refers to low suppression adaptor. The primerconcentration variation is shown beneath each lane. FIG. 23 shows thatwhile all samples had similar levels of transcriptome amplification, thehigh suppression adapter at lower primer concentrations had the fewestprimer-artifacts (as evidenced by the lack of primer-dimer band at thebottom of the gel).

Example 10: Testing Bead Capacity for WTA Amplification

This example describes experiments done to test the bead capacity forWTA amplification. The method comprises:

Bead Preparation: 80 μl 2M beads/ml uncoupled beads were put in a 1.5 mltube and bound to a magnet to remove supernatant. The beads were washed2× with 200 μl RNase-free water and resuspended in two 10 μl aliquots inwater.

WTA Amplification: WTA PCR reactions were set up with 1 μl resuspendedbeads.

WTA Bare Tube beads Beads Water Polymerase 1 0 0 20 Q5 2 1 0 19 Q5 3 100 10 Q5 4 10 10 0 Q5 5 0 0 20 KAPA 6 1 0 19 KAPA 7 10 0 10 KAPA 8 10 100 KAPA

WTA Mastermix 1 2 X Reagent 2.5 5 μl Water 2.5 5 μl 5 μM CBO40 25 50 μl2X Q5 HotStart Mastermix or APA Hifi 30 60 μl Total

Cycling Conditions 98° C. 2 min 98° C. 20 s 20X 58° C. 15 s 72° C. 3 min72° C. 5 min  4° C. hold

Bead Preparation: Beads were in ˜100 μl Tris-Tween, two 10 μl beadamplifications were performed, one with and one without BSA.

WTA Amplification: WTA PCR reactions were set up with 1 μl resuspendedbeads.

WTA Mastermix 1 2 X Reagent 7.5 15 μl Water 10 20 μl Resolve Beads 5 10μl 10X BSA or water 2.5 5 μl 5 μM CBO40 25 50 μl 2X KAPA Hifi 50 100 μlTotal

Cycling Conditions 98° C. 2 min 98° C. 20 s 20X 58° C. 15 s 72° C. 3 min72° C. 5 min  4° C. hold

The results of the method are shown in FIG. 24. As shown on the left,different amount of beads with cDNA transcribed from bulk RNA (cDNA)were mixed with uncoupled beads (Empty) (e.g., beads without conjugatedoligonucleotides) to mimic the loaded and unloaded beads from a typicalexperiment. Different amount of the beads were added to a standard 50 μlPCR reaction with either Q5 polymerase 2× mastermix or KAPA Hifipolymerase. In both cases, the maximum amount of beads per PCR was˜10,000.

As shown on the right in FIG. 24, the WTA protocol was prepared from asample that had been contacted with cells (as opposed to the left imageof FIG. 24 which had no cells). Then about 10,000 beads from theexperiment were added in each of two PCR reactions, either with orwithout BSA. The BSA was used to counter bead inhibition. The smearabove the primer dimers indicates amplification of some RNA species.

The samples above were used to prepare a library. The WTA product andthe library were analyzed with the Bioanalyzer. The protocol forgenerating the library comprises: Annealing mix was added to 50 ng WTAproduct, heated to 95° C. for 2 min and cooled to 4° C. for 5 min. 7 μlKlenow(exo−) mastermix was added to each tube. The reaction mixture wasincubated at 37° C. for 30 min and at 80° C. for 20 min, and cleaned upwith 35 μl AmpureXP, followed by elution in 22 μl water.

Library Preparation: 28 μl Q5 Mastermix was added, and PCR was conductedusing the cycling conditions below. Clean up was done with the sameratio of beads as used for library size selection and eluted in 30 μlTris-Tween.

Annealing mix 1 2.5 X Reagent 5 12.5 μl 10 μM R2-N9 (CBO121) 1 2.5 μl 10mM dNTP 5 12.5 μl WTA Product 32 80 μl Water 43 107.5 μl Total

Klenow Mastermix 1 3 X Reagent 5 15 μl 10X NEB2.1 1 3 μl AsiSI 1 3 μlKlenow exo- (NEB) 7 21 μl Total

Q5 Mastermix 1 2.5 X Reagent 25 62.5 μl 2X Q5 HotStart Mastermix 1.53.75 μl D501 1.5 3.75 μl D70X 28 62.5 μl Total

Cycling Conditions 98° C. 30 s 98° C. 10 s 12X 65° C. 15 s 72° C. 20 s72° C. 2 min  4° C. hold

Tube Index 1 D707 2 D708

As shown FIG. 25, the left trace shows the WTA products resulting fromFIG. 24 right-hand side. The right trace in FIG. 25 is from the libraryof WTA product. The product for each was the right size, as evidenced bythe trace. This indicates that the right sequences were added to the WTAligation and library preparation methods and a valid library can beprepared from solid supports.

Example 12: Analysis of Exonuclease Treatment

This example was performed to test the effect of exonuclease treatmenton the beads using the adaptor ligation methods of the disclosure. Themethod comprises:

Bead Preparation: 16 μl 2.5M beads/ml Resolve beads were put in a 1.5 mltube. 144 μl beads were put in a separate tube. The beads were bound toa magnet and supernatant was removed. The beads were resuspended in 100μl Trsi-Tween and heated to 95° C. for 2 min. The beads were washed 2×with 200 μl RNase-free water (set aside the 144 μl beads for later). Thebeads were resuspended in 21.5 μl Annealing mix and incubated 3 min @65° C. in the heat block with occasional vortexing. The beads wereplaced on ice.

1^(st) Strand Synthesis: 18.5 μl RT Mastermix was added to the beads.The reaction mixture was incubated at 42° C. for 30 min in thethermomixer and 80° C. for 5 min on a heat block. The beads were boundto a magnet to remove the supernatant. The beads from the 1st strandreaction were mixed with the no RT beads, and washed 2× with 200μTris-Tween. After resuspending for the second wash, the beads were splitinto four 50 μl tubes. For tubes #3&4 only (other tubes go straight to2nd strand reaction), the beads were bound to magnet and supernatant wasremoved. The beads were resuspended in 40 μl ExoI mastermix. Thereaction mixture was incubated at 37° C. for 30 min and at 80° C. for 20min. The beads were washed 1× in 200 μl Tris-Tween.

2^(nd) Strand Synthesis: 80 μl 2nd Strand Synthesis Mix was added to thebeads. 20 μl 2nd Strand Buffer Mix without enzyme was added. Thereaction mixture was incubated 2.5 hrs at 16° C. in the thermomixer and1 μl 3U/μ1 T4 DNA Polymerase was added and incubated 5 min at 16° C.(blunt the cDNA) to tube #1 (or #3). The reaction mixture wastransferred to ice, and 5 μl 0.5M EDTA was added and mixed to stop allenzymatic activity followed by wash 2× in 100 μl Wash Buffer (10 mM TrispH8.0, 150 mM NaCl, 5 mM EDTA, 0.05% tween-20) and 2×100 μl Tris-Tweento remove 2nd strand enzymes.

Adaptor Ligation: 50 μl Ligation Mix was added to each tube andincubated 30 min @ room temp on the rotator. The beads were bound to amagnet and washed 2×200 μl Tris-Tween and resuspended in 50 μlTris-Tween and store at 4° C.

WTA Amplification: WTA PCR reactions were set up with 5 μl resuspendedbeads in WTA Mastermix and PCR was conducted.

Annealing mix 1 1 X Reagent 4 4 μl 1 ng/μl K562 RNA 0.5 0.5 μl 1 pg/μlspike-in 17 17 μl water 21.5 53.75 μl Total

RT Mastermix 1 2.5 X Reagent 8 20 μl 5X Protoscipt buffer 4 10 μl 1mg/ml BSA 2 5 μl 10 mM dNTP 2 5 μl 100 mM DTT 2 5 μl ProtoscriptII 0.51.25 μl Murine Rnasin 18.5 46.25 μl Total

ExoI Mastermix 1 2.5 X Reagent 4 10 μl 10X ExoI buffer 2 5 μl ExoI 34 85μl Water 40 100 μl Total

2nd Strand Synthesis Mix 1 2.2 X Reagent 8 17.6 μl 1 mg/ml BSA 8 17.6 μl10X 2nd Strand Synthesis Buffer 4 8.8 μl 2nd Strand Enzyme Mix 60 132 μlwater 16 35.2 μl Total

Ligation Mix 1 2.2 X Reagent 32 70.4 μl water 5 11 μl 1 mg/ml BSA 10 22μl 5X Quick Ligation Reaction Buffer 1 2.2 μl Quick T4 DNA Ligase 2 4.4μl 5 μM Anealed CBO122/103 50 110 μl Total

WTA Mastermix 1 4.2 X Reagent 17.5 73.5 μl Water 2.5 10.5 μl 5 μM CBO4025 105 μl 2X KAPA Hifi 45 189 μl Total

Cycling Conditions 98° C. 2 min 98° C. 20 s 20X 65° C. 15 s 72° C. 3 min72° C. 5 min  4° C. hold

The results of the experiment are shown in FIG. 28. The no treatmenttrace is shown on the left. cDNA sample treated with exonuclease isshown on the right. Treatment occurred before proceeding with secondstrand synthesis and adaptor ligation. WTA PCR showed that there was amuch higher yield when there was an ExoI treatment.

Example 13: WTA Analysis Using 1 ng and 10 pg RNA

This example describes the methods of the disclosure using primersunconjugated to beads. The protocol for analyzing 1 ng RNA is describedin FIGS. 29A and 29B. The protocol for analyzing 10 pg RNA is describedin FIGS. 30A and 30B.

Using both 1 ng/well or 10 pg/well produced a proper WTA product, asevidenced by the correct size distribution of the Bioanalyzer tracesshown in FIG. 18.

Furthermore, both the 1 ng/well and 10 pg/well experiments resulted in acorrespondence in the number of reads for each experiment. FIG. 19 showsthe number of reads for each gene in the assay. The number of reads forboth samples was similar. This showed the robustness of the assay.

The sequencing output from the reads (e.g., FIG. 19) was graphicallydisplayed, as shown in FIG. 20. Left, a PCA plot that clearly separatesthe UHRR (universal human reference RNA) wells from the Human BrainReference RNA (HBRR) wells. Right, a heatmap of the genes used for PCAalong with hierarchical clustering that shows all of the like RNA typesclustering together. FIG. 20 shows that different cell types can beidentified using the methods of the disclosure.

Example 14: Comparison of Ligation-Based and Transposome-Based WTA

The Nextera XT or Nextera kit (Illumina) was used to generate sequencinglibraries directly from beads. The Tn5 transposase was used to randomlyfragment captured mRNA (either still in original form in a RNA/DNAcomplex, or converted to cDNA) and attach adapters. Primers that canhybridize to these adapter sequences were used to attach P5 and P7indices for multiplexed Illumina sequencing runs. No further librarypreparation was required. While proof of principle experiments were donewith commercial Nextera sequencing library kits, only the transposomecomponent of the kit was used. Transposases can be purchasedcommercially and transposomes can be custom built to allow theattachment of any oligo sequence of choice (in this case, a universalsequence). If custom built, transposomes can be built with oligoscontaining unique molecular index, such that when the transposase cutsthe DNA, a unique molecular index is attached in addition to theuniversal sequence. This will add a second molecular index to each ofthe molecules (one molecular index already added as the cDNA istranscribed using primer on the Resolve bead).

FIG. 33 shows the PCA plots for WTA analysis of 3 cell types usingeither ligation-based or transposome-based protocols with beads. Theresults show that with or without second strand synthesis, thetransposome-based WTA analysis was able to distinguish the 3 cell typesbased on WTA. Ligation-based WTA analysis also was able to distinguishthe 3 cell types.

All publications, patents, and patent applications mentioned in thisspecification are herein incorporated by reference for all purposes tothe same extent as if each individual publication, patent, or patentapplication was specifically and individually indicated to beincorporated by reference.

In at least some of the previously described embodiments, one or moreelements used in an embodiment can interchangeably be used in anotherembodiment unless such a replacement is not technically feasible. Itwill be appreciated by those skilled in the art that various otheromissions, additions and modifications may be made to the methods andstructures described above without departing from the scope of theclaimed subject matter. All such modifications and changes are intendedto fall within the scope of the subject matter, as defined by theappended claims.

With respect to the use of substantially any plural and/or singularterms herein, those having skill in the art can translate from theplural to the singular and/or from the singular to the plural as isappropriate to the context and/or application. The varioussingular/plural permutations may be expressly set forth herein for sakeof clarity. As used in this specification and the appended claims, thesingular forms “a,” “an,” and “the” include plural references unless thecontext clearly dictates otherwise. Any reference to “or” herein isintended to encompass “and/or” unless otherwise stated.

It will be understood by those within the art that, in general, termsused herein, and especially in the appended claims (e.g., bodies of theappended claims) are generally intended as “open” terms (e.g., the term“including” should be interpreted as “including but not limited to,” theterm “having” should be interpreted as “having at least,” the term“includes” should be interpreted as “includes but is not limited to,”etc.). It will be further understood by those within the art that if aspecific number of an introduced claim recitation is intended, such anintent will be explicitly recited in the claim, and in the absence ofsuch recitation no such intent is present. For example, as an aid tounderstanding, the following appended claims may contain usage of theintroductory phrases “at least one” and “one or more” to introduce claimrecitations. However, the use of such phrases should not be construed toimply that the introduction of a claim recitation by the indefinitearticles “a” or “an” limits any particular claim containing suchintroduced claim recitation to embodiments containing only one suchrecitation, even when the same claim includes the introductory phrases“one or more” or “at least one” and indefinite articles such as “a” or“an” (e.g., “a” and/or “an” should be interpreted to mean “at least one”or “one or more”); the same holds true for the use of definite articlesused to introduce claim recitations. In addition, even if a specificnumber of an introduced claim recitation is explicitly recited, thoseskilled in the art will recognize that such recitation should beinterpreted to mean at least the recited number (e.g., the barerecitation of “two recitations,” without other modifiers, means at leasttwo recitations, or two or more recitations). Furthermore, in thoseinstances where a convention analogous to “at least one of A, B, and C,etc.” is used, in general such a construction is intended in the senseone having skill in the art would understand the convention (e.g., “asystem having at least one of A, B, and C” would include but not belimited to systems that have A alone, B alone, C alone, A and Btogether, A and C together, B and C together, and/or A, B, and Ctogether, etc.). In those instances where a convention analogous to “atleast one of A, B, or C, etc.” is used, in general such a constructionis intended in the sense one having skill in the art would understandthe convention (e.g., “a system having at least one of A, B, or C” wouldinclude but not be limited to systems that have A alone, B alone, Calone, A and B together, A and C together, B and C together, and/or A,B, and C together, etc.). It will be further understood by those withinthe art that virtually any disjunctive word and/or phrase presenting twoor more alternative terms, whether in the description, claims, ordrawings, should be understood to contemplate the possibilities ofincluding one of the terms, either of the terms, or both terms. Forexample, the phrase “A or B” will be understood to include thepossibilities of “A” or “B” or “A and B.”

In addition, where features or aspects of the disclosure are describedin terms of Markush groups, those skilled in the art will recognize thatthe disclosure is also thereby described in terms of any individualmember or subgroup of members of the Markush group.

As will be understood by one skilled in the art, for any and allpurposes, such as in terms of providing a written description, allranges disclosed herein also encompass any and all possible sub-rangesand combinations of sub-ranges thereof. Any listed range can be easilyrecognized as sufficiently describing and enabling the same range beingbroken down into at least equal halves, thirds, quarters, fifths,tenths, etc. As a non-limiting example, each range discussed herein canbe readily broken down into a lower third, middle third and upper third,etc. As will also be understood by one skilled in the art all languagesuch as “up to,” “at least,” “greater than,” “less than,” and the likeinclude the number recited and refer to ranges which can be subsequentlybroken down into sub-ranges as discussed above. Finally, as will beunderstood by one skilled in the art, a range includes each individualmember. Thus, for example, a group having 1-3 articles refers to groupshaving 1, 2, or 3 articles. Similarly, a group having 1-5 articlesrefers to groups having 1, 2, 3, 4, or 5 articles, and so forth.

While various aspects and embodiments have been disclosed herein, otheraspects and embodiments will be apparent to those skilled in the art.The various aspects and embodiments disclosed herein are for purposes ofillustration and are not intended to be limiting, with the true scopeand spirit being indicated by the following claims.

What is claimed is:
 1. A kit, comprising: a plurality of solid supports,wherein each of the solid support comprises a plurality of stochasticbarcodes each comprising a first universal label; an adaptor comprisinga second universal label; and an enzyme, wherein the enzyme is a ligaseor a transposase.
 2. The kit of claim 1, wherein said enzyme is aligase.
 3. The kit of claim 2, wherein the ligase is a T4 DNA ligase. 4.The kit of claim 1, wherein said enzyme is a transposase.
 5. The kit ofclaim 4, wherein said transposase is a Tn5 transposase, or a hyperactivederivative thereof.
 6. The kit of claim 1, wherein said plurality ofsolid supports comprises one or more discrete particles of plastic,ceramic, metal, or polymeric material.
 7. The kit of claim 6, whereinthe one or more discrete particles are beads.
 8. The kit of claim 7,wherein the one or more beads are hydrogel beads.
 9. The kit of claim 1,further comprising a microwell array.
 10. The kit of claim 1, whereinsaid second universal label is at most 99% identical to said firstuniversal label.
 11. The kit of claim 1, wherein the stochastic barcodesare covalently immobilized to the solid support.
 12. The kit of claim 1,wherein the stochastic barcodes are non-covalently immobilized to thesolid support.